Announcements

Help us improve the Power & Sensing Selection Guide. Share feedback

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
User20633
Level 1
Level 1
Welcome!
Hello,

I'm currently working on running a stress test to verify the performance of my driver. the scenario is having HOST and HSM continuously writing to their correspondent D flash.

After some seconds i got and ECC errors in the host D flash and it perform a reset.

I have checked CPU0_DSTR and i have the bit LBE (Load Bus Error) set to 1.

If any one has experience with the topic or can give some hints you're most welcome, and if you need more infos, just ask.

Thanks for your help.
0 Likes
17 Replies
Darren_Galpin
Employee
Employee
First solution authored First like received
Hi,

So you are trying to do a load page/write page (or burst) to the HOST and HSM at the same time? First question - is DFlash1 configured as HSM exclusive or not? Second question - what are the DMU_HF_ERRSR and DMU_SF_ERRSR set to - are they reporting any programming errors? If not, is any bus error captured in the XBAR error capture registers going to the DMU?

Cheers,

Darren
0 Likes
NeMa_4793301
Level 6
Level 6
10 likes received 10 solutions authored 5 solutions authored
ECC errors in DFLASH do not cause a bus error. Instead, read errors are reported via HF_ECCS (for DF0) and SF_ECCS (for DF1).

What is causing the reset? Is there an SMU alarm? Is there a CPU trap? Can you also check DEADD?
0 Likes
User20633
Level 1
Level 1
Welcome!
Hello Darren,

Yes the HSM D Flash is configured as HSM exclusive
DMU_HF_ERRSR and DMU_SF_ERRSR set to 0.
Can you precise which register can tell me the status of the XBAR from DMU point of view ?

4779.attach

You can see the values of some registers that i picked in the attached file, the screenshot is taken at the moment when the error happend.

Thanks for you answer,

Hedi.
0 Likes
Darren_Galpin
Employee
Employee
First solution authored First like received
So the DMU_HF_ECCS and DMU_SF_ECCS are both reporting multi-bit ECC fails, so they haven't been initialised. The CPU_DEADD is reporting an address in the HOST command sequence interpreter, which means that one of the programming steps got an error. The reset will also clear the HF_ERRSR and SF_ERRSR registers, so the fail reason for the programming is not retained.

As UC_wrangler said, what caused the reset? Can you halt before the reset but after the programming sequence and get the values of the registers again?
0 Likes
User20633
Level 1
Level 1
Welcome!
Darren and UC_wrangler,

the screenshot is taken before the reset i assume, from the debugger i can tell that a reset is not yet detected, so the state of the registers is before the reset.

i have a further question, haw can i verify if an alarm is happening or a my CPU is running into a trap ? Can you tell me a way to detect which type of exception is happening maybe this can lead to root cause of the error .

Thank you,
Hedi.
0 Likes
Darren_Galpin
Employee
Employee
First solution authored First like received
So an LBE means that there was an SRI bus error when attempting to load data from external memory. In this case we want to look at the XBAR0 ERRADDRx and ERRx registers which capture the details of the transaction which caused the error. CPU0 is on MCI2 of the XBAR, so look at ERRADDR2 and ERR2. This will then tell us which peripheral is generating the error.
0 Likes
User20633
Level 1
Level 1
Welcome!
I have :

- DOM0_ERRADDR2 set to 0xB040FFF8
- DOM0_ERR2 set to 0x00C98135

4791.attach

0xB040FFF8 is somewhere in RAM correct ?
0 Likes
MoD
Employee
Employee
10 likes given 50 likes received 500 replies posted
The address is in DAM0 RAM1
This was a 64-bit write access to this address from CPU0
0 Likes
User20633
Level 1
Level 1
Welcome!
I didn't understand what is happening.

The trap is happening because my code is accessing this DAM0 ? why would this trigger a trap ? and does this has anything to do with having the HSM active ?
0 Likes
Darren_Galpin
Employee
Employee
First solution authored First like received
The trap is happening because the CPU is writing to DAM0 RAM1. Have a look at LMU_DAM.MEMCON to see what it is reporting as the error cause. It does support 64 bit write access, so it isn't immediately obvious why this should error,

The memory regions in the DAM RAM are region protected though - check LMU_RGNACCENWAx and LMU_RGNACCENWBx to see if the CPU0 tag is enabled for this memory region.
0 Likes
MoD
Employee
Employee
10 likes given 50 likes received 500 replies posted
There was an access of CPU0 to DAM0 RAM1, I expect here from the firmware of the device. This has nothing to do with your trap. Your trap comes from a read access to DF0 address 0xAF011120. This access returned an error. Maybe CPU0 would read the address but the HSM wrote to this location at the same time. Is there any error on HSM?.
0 Likes
User20633
Level 1
Level 1
Welcome!
Darren, MoD,

I checked my code, this region of memory is only accessed by CPU0 during the initialization, and after that there is no access happening. the problem appears when CPU0 try to read (or Blank check) the content of it's D flash.
the error is happening randomly because the same line of code is executed many time before the error appears. my driver always jump from one sector to another (when the current one full) and the error happen in different addresses randomly,
i erased the total flash via the debugger to make sure there is no dead cells or corrupted words, everything looks normal. and the error still appearing.

The HSM stops when the error due to an option in the debugger that allows me to halt the tricore when an error happens, and the call stack of the HSM is doing some normal function calls in my main function. i do not see any error on HSM.

maybe an important details is that when CPU0 stops the register D|[15] has the value 2 and i read that this is the Trap Identification Number (TIN) and A[11] has the address of the instruction that cause the trap and i can see also that it is correct.
So it looks like i'm dealing with Data Access Synchronous Error Trap, and i also have the CPU0_DSTR.LBE set to 1.

the question is how can i identify the cause of this trap ?
My D flashes are both in complement sensing mode and DF1 is HSM exclusive.
0 Likes
Darren_Galpin
Employee
Employee
First solution authored First like received
Hi Hedi,

Can we look at the XBAR registers XBAR_ERRADDR1 and XBAR_ERR1. As the CPU appears to be getting a bus error when loading from the DMU, this will tell us what transaction was sent to the DMU that caused the error.

Cheers,

Darren
0 Likes
User20633
Level 1
Level 1
Welcome!
there are no registers called XBAR_ERRADDR1 and XBAR_ERR1, i think you mean DOM0_ERRADDR1 and DOM0_ERR.

The trap basically happens inside two functions of mine (MemCopy32 and MemCopy64), so here attached two screenshots in both cases.

Error Happens inside MemCopy32
4800.attach

Error happens inside MemCopy64
4801.attach

the question is why would the memory access generate this trap ?, my code pass through these functions a couple of times before the trap happens.
0 Likes
Darren_Galpin
Employee
Employee
First solution authored First like received
Hi Hedi,

Good question...… The XBAR error register confirms that it was a word read to address 0xAF03FB40 which caused the error. The following conditions would trigger a bus error when reading the flash:

1) DMU is in sleep mode
2) Flash is not available (not the case here!)
3) DFlash is busy (so check the DMU_HF_STATUS)
4) There was an uncorrectable ECC error (so check DMU_HF_ECCS).
5) Data was read with a burst opcode, but this is not the case here.

Please could you check the two registers mentioned, as there are not a lot of reasons why the DMU would error a flash read.

Cheers,

Darren
0 Likes
User20633
Level 1
Level 1
Welcome!
Darren,

Issue Solved, the DF0BUSY bit was set at the time i access the DF0, this was not visible to me via the debugger because the flag will be cleared by HW when the DF0 is no longer busy. i added some debug code to see that it is busy at the access time.

thank you all for the support, it really helped me a lot.

Cheers,

Hedi.
0 Likes
Darren_Galpin
Employee
Employee
First solution authored First like received
Good news! Thanks for letting me know.
0 Likes