cancel
Showing results for 
Search instead for 
Did you mean: 

Wi-Fi Combo

NiMc_1688136
Contributor II

In the CYW43907, the appscr4_saved_core_status contains bits relating to the reset cause of the processor:

s_error_log

s_bp_reset_log

force_proc_reset_log

Does anyone know the details of each reset flag and what will cause them to be set?

While my boards runs over time it will randomly get a reset and I am trying to track down whether it is a random exception or if it is noise on the reset line or some other issue. I am not able to monitor the serial port on all devices while running in a debug build to see a exception so i am trying to find as many clues as i can.

0 Likes
1 Solution
NiMc_1688136
Contributor II

Could there be an issue running with a debugger relating to an internal bus?

After many changes and still having issues, during my last testing it appeared the thread's stack or stack pointer value became corrupted based on the call tree in the debugger window and a failure in the return from vPortFree.

As I removed code line by line I found the issue was related to snprintf

dbgPkt.size += snprintf( (char*)dbgPkt.data+dbgPkt.size, MAX_PACKET_SIZE, "%llu,",( long long unsigned int ) time );

After looking at it i realize i need to subtract the current size from the max size for the second parameter but it shouldn't matter as in this case the dbgPkt.size was 11 and the buffer (MAX_PACKET_SIZE) is 128. There is no way this should produce an overrun unless there is an issue in snprintf?

I changed to

dbgPkt.size += snprintf( (char*)dbgPkt.data+dbgPkt.size, (MAX_PACKET_SIZE-dbgPkt.size), "%llu,",( long long unsigned int ) time );

I was able to run 11 hours without an exception  (debugger was disconnected)...

View solution in original post

0 Likes
6 Replies
PriyaM_16
Moderator
Moderator

We are aware of the issues you are facing due to random resets in your board. Are you using secure_sflash or xip in your design?

0 Likes
NiMc_1688136
Contributor II

Secure_sflash or xip option are not used in the application.

The resets come at random times; the board could run for hours or days. I am not sure if there is an assert/sw reset that gets triggered at some point or if this is a noise issue on the RESET_N line. I have seen the board randomly reset when a programming cable remains connected to the board and caught what appeared to be noise on the reset line but normal behavior is not to have any cables attached.

Speaking of the reset line, we have no external components on the line and trace is mostly on an inner layer.

0 Likes
NiMc_1688136
Contributor II

PriyaM_16

While running in debug i was able to capture one of these exceptions....

Exception = data_abort_handler

"2/1/2019 4:59:55 PM",data_abort_handler

"2/1/2019 4:59:55 PM",DFSR : 0x00001C06

"2/1/2019 4:59:55 PM",DFAR : 0x00000000

"2/1/2019 4:59:55 PM",IFSR : 0x00000000

"2/1/2019 4:59:55 PM",IFAR : 0x00000000

"2/1/2019 4:59:55 PM",CPSR : 0x00000197

"2/1/2019 4:59:55 PM",R0   : 0x00546148

"2/1/2019 4:59:55 PM",R1   : 0x00005249

"2/1/2019 4:59:55 PM",R2   : 0x00005248

"2/1/2019 4:59:55 PM",R3   : 0x005EDDA0

"2/1/2019 4:59:55 PM",R4   : 0x04040404

"2/1/2019 4:59:55 PM",R5   : 0x05050505

"2/1/2019 4:59:55 PM",R6   : 0x06060606

"2/1/2019 4:59:55 PM",R7   : 0x005F14D0

"2/1/2019 4:59:55 PM",R8   : 0x08080808

"2/1/2019 4:59:55 PM",R9   : 0x09090909

"2/1/2019 4:59:55 PM",R10  : 0x10101010

"2/1/2019 4:59:55 PM",R11  : 0x11111111

"2/1/2019 4:59:55 PM",R12  : 0x00000308

"2/1/2019 4:59:55 PM",LR   : 0x004CD5DA

status = CR4_FAULT_STATUS_ASYNC_EXTERNAL_ABORT_AXI_SLAVE_ERROR

Every time i capture the exception, it is related to the same function and thread. Basically I have a gatekeeper thread that manages access to the console output. The gate keeper thread uses malloc when passing the string to the queue and free once the string has been received and send out the serial port. The gate keeper thread is a copy of the aws_logging_task_dynamic buffers file provided in the AWS FreeRTOS libraries.

I am encountering the error in free. I am using FreeRTOS Heap_3 configuration which provides a mutex around free. It is when FreeRTOS tries to perform xSemaphoreGiveRecursive in __malloc_unlock that the exception triggers. The newlib_malloc_mutex pointer is valid and the original data pointer of the string is valid.

The exception is reproducible on my side but appears to be random.

Do you have any suggestions in debugging a AXI slave error or what it means?

0 Likes
decac_1684766
New Contributor II

A description of the DFSR (Data Fault Status Register) is here http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0363e/BGBEDEIF.html​ for the R4.

I have hit this kind of error (a lot...) when I either accidentally free something multiple times or I have init'd a mutex on a stack and then forget to de-init.  This corrupts the mutex linked list and then the first time something else accesses the mutex list it blows up.

NiMc_1688136
Contributor II

I am still receiving this issue and the exception is always related to the free function, from the same thread.

FreeRTOS is using heap3 so malloc/free is protected. I have tested with the WICED modification of a recursive mutex and also the default FreeRTOS implementation (stops the scheduler).

From what I can tell it runs fine for a while, the pointers are all valid and then it randomly breaks. I do not know if this is something related to an advanced feature of the CYW43907 like memory buses or cache. It seems like the problem happens less often if i add code prior to vPortFree being called.

The exception is data_abort, the LR always points to some statement in Free() ( seems different from time to time based on the disassembly window), and the status shows CR4_FAULT_STATUS_ASYNC_EXTERNAL_ABORT_AXI_SLAVE_ERROR.

I am positive that the pointer has not been previously free'd.

Consider the following code which is a UDP logging thread that pulls a pointer from the queue, acts on the pointer (sends data over UDP) and then free's the pointer.

Note, the exception will still periodically trigger is txDebugSocket_UDP is removed and the pointer + data look fine prior to calling vPortFree.

    for( ;; )

    {

        /* Block to wait for the next string to print. */

         if(udpEnabled)

         {

               if( xQueueReceive( UDP_Queue, &ptrMsg, portMAX_DELAY ) == pdPASS )

               {

                    txDebugSocket_UDP( (char*)ptrMsg->data, ptrMsg->size, ptrMsg->port );

                    lastAddress = uxTaskGetStackHighWaterMark(NULL);

                    vPortFree( ( void * ) ptrMsg );

               }

          }

         else

         {

              wiced_rtos_delay_milliseconds( 1000 );

         }

    }

I have checked the stack watermark of the UDP log thread and it is good. I have not checked the stack of any other threads. I guess the stack of other threads could overrun corrupting other data in the heap.

I have added watermark checks on all my app threads and also check to make sure the main stack does not overflow. No issues during exception.

One exception triggered after free returned and fired in the return code of the vPortFree call.

0 Likes
NiMc_1688136
Contributor II

Could there be an issue running with a debugger relating to an internal bus?

After many changes and still having issues, during my last testing it appeared the thread's stack or stack pointer value became corrupted based on the call tree in the debugger window and a failure in the return from vPortFree.

As I removed code line by line I found the issue was related to snprintf

dbgPkt.size += snprintf( (char*)dbgPkt.data+dbgPkt.size, MAX_PACKET_SIZE, "%llu,",( long long unsigned int ) time );

After looking at it i realize i need to subtract the current size from the max size for the second parameter but it shouldn't matter as in this case the dbgPkt.size was 11 and the buffer (MAX_PACKET_SIZE) is 128. There is no way this should produce an overrun unless there is an issue in snprintf?

I changed to

dbgPkt.size += snprintf( (char*)dbgPkt.data+dbgPkt.size, (MAX_PACKET_SIZE-dbgPkt.size), "%llu,",( long long unsigned int ) time );

I was able to run 11 hours without an exception  (debugger was disconnected)...

View solution in original post

0 Likes