Detecting buffer overflows and recovering

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
mgiacomelli
Level 3
Level 3
25 replies posted 10 questions asked 25 sign-ins

I am thinking about ways to verify that data is acquired continuously and for the host to recover in the event of a fault due to excess CPU load, USB glitch, etc.  In this case I would like the host to know this has happened, throw out the data after the overflow, and signal to the FX3 to reset to just before the overflow.

One simple method I tried was to run in CY_U3P_DMA_TYPE_MANUAL_MANY_TO_ONE mode and then count CY_U3P_DMA_CB_PROD_EVENT events. I then append the current event count to each buffer and the host checks that the buffers are incremented with no gaps.   However, in testing where I deliberately cause CYU3P_PIB_ERR_THRX_WR_OVERRUN events by overloading the USB host (PC) CPU, the received buffers always have continuous CY_U3P_DMA_CB_PROD_EVENT counts even though some data is lost to overrun and the CYU3P_PIB_ERR_THRX_WR_OVERRUN event is generated.  

Two questions:

1)  I had thought that the DMA engine would always cause a PROD event when I switch threads, but if the host doesn't drain the buffer fast enough, some of this data would be overwritten after causing the event.  This does not seem to be the case, so when is the PROD event actually raised?  How does an overrun happen without generating the PROD event?

2)  Aside from counting PROD events, is there some other thing I can count in order to figure out if any DMA buffers are lost or overrun?  If I understand correctly, there is no way for the CPU to access the current value of any of the DATA/ADDR/CTRL counters, but maybe there is something else that increments when the DMA engine switches threads?  I think I could count CYU3P_PIB_ERR_THRX_WR_OVERRUN events, but it isn't clear to me how that timing on those works.

0 Likes
1 Solution
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

CYU3P_PIB_ERR_THRX_WR_OVERRUN  is triggered when data more than the DMA buffer size is written. The Producer event is generated when DMA buffer is completely filled. In the Multi DMA channel, with two PIB sockets, the GPIF thread switching will be done by GPIF SM when the address lines are not driven externally. 

If the GPIF thread switching is not controlled externally, the GPIF thread switching takes some time so if the data is written during that period, the data will be lost. Please let me know how is the thread switching done in the GPIF SM.

Aside from counting PROD events, is there some other thing I can count in order to figure out if any DMA buffers are lost or overrun?

>> To avoid CYU3P_PIB_ERR_THRX_WR_OVERRUN you can used DMA flow control flags i.e.  DMA ready & DMA watermark flags. 

Counting producer events and consumer events will help you to understand the number of DMA buffer available for filling. While the DMA Flags will indicate the status of the DMA buffer currently being filled by the PIB socket.

Regards,
Rashi

View solution in original post

0 Likes
7 Replies
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

CYU3P_PIB_ERR_THRX_WR_OVERRUN  is triggered when data more than the DMA buffer size is written. The Producer event is generated when DMA buffer is completely filled. In the Multi DMA channel, with two PIB sockets, the GPIF thread switching will be done by GPIF SM when the address lines are not driven externally. 

If the GPIF thread switching is not controlled externally, the GPIF thread switching takes some time so if the data is written during that period, the data will be lost. Please let me know how is the thread switching done in the GPIF SM.

Aside from counting PROD events, is there some other thing I can count in order to figure out if any DMA buffers are lost or overrun?

>> To avoid CYU3P_PIB_ERR_THRX_WR_OVERRUN you can used DMA flow control flags i.e.  DMA ready & DMA watermark flags. 

Counting producer events and consumer events will help you to understand the number of DMA buffer available for filling. While the DMA Flags will indicate the status of the DMA buffer currently being filled by the PIB socket.

Regards,
Rashi
0 Likes

@Rashi_Vatsa wrote:

CYU3P_PIB_ERR_THRX_WR_OVERRUN  is triggered when data more than the DMA buffer size is written. The Producer event is generated when DMA buffer is completely filled. In the Multi DMA channel, with two PIB sockets, the GPIF thread switching will be done by GPIF SM when the address lines are not driven externally. 

 


I have configured the GPIF state machine to ping-pong back and forth between two threads as described in the UVC example code.  Continuous data acquisition is working, so I think there is no issue there.  

If I understand correctly, filling the buffer will generate the PROD event.  The GPIF will then switch to the next buffer and generate a new PROD event when it is full.  During the PROD event, the CPU will call CyU3PDmaMultiChannelCommitBuffer to commit the buffer. 

Since the PROD event happens before the overrun, counting it does not detect the overrun.  What happens if the host stops accepting data leading to an overrun?  Does the CyU3PDmaMultiChannelCommitBuffer fail?  Or does some later stage of the transfer fail?  Is monitoring the return value of CyU3PDmaMultiChannelCommitBuffer a way to see if the buffer makes it to the host?


@Rashi_Vatsa wrote:

>> To avoid CYU3P_PIB_ERR_THRX_WR_OVERRUN you can used DMA flow control flags i.e.  DMA ready & DMA watermark flags. 

Counting producer events and consumer events will help you to understand the number of DMA buffer available for filling. While the DMA Flags will indicate the status of the DMA buffer currently being filled by the PIB socket.


To be clear, I am not asking how to avoid an overrun, but rather monitor for the event that one happens.  So far I  have done continuous acquisitions lasting hours without generating a single overrun so I think my code is robust, but I would like a way to monitor for failure in real time, since an overrun probably represents and unexpected event or other bigger failure in my system.  Essentially, I am trying to make the system failsafe, so I am forcing overruns to occur by interrupting transfers on the host side and then trying to recover.  

0 Likes

Hello,

Since the PROD event happens before the overrun, counting it does not detect the overrun.  What happens if the host stops accepting data leading to an overrun?

>> Please note that once the PROD event is generated, GPIF thread switching will take place and then the next buffer from second socket will be pointed/filled. If the data is written during the GPIF thread switching the data will not be buffered and will be lost. In this case CYU3P_PIB_ERR_THRX_WR_OVERRUN  error will be triggered. 

The second case of overrun can be DMA buffer overrun i.e. when all the DMA buffer associated with the DMA channel are filled i.e. when the USB host doesn't consume the data as fast as the GPIF fills the data to the DMA buffers. In this case, CyU3PDmaMultiChannelCommitBuffer  fails with error code 71 as mentioned in this KBA  Invalid Sequence Error in Multi-Channel Commit Buf... - Cypress Developer Community  

Regards,
Rashi
0 Likes

I did some more testing by configuring the above GPIF instance to continuously acquire data at 100 MHz.  Then half way through an acquisition, I set the host to Sleep(50) and then resume in order to force a brief overflow.  On the FX3, I counted PRODUCE, CONSUME, commit failures from CyU3PDmaMultiChannelCommitBuffer and also the number of CYU3P_PIB_ERR_THRX_WR_OVERRUN events generated.  Here are the results:

 

Starting the GPIF !
ERR_THR0_WR_OVERRUN: 1 (PROD: 12212, CONS: 12209) commit fail: 0
ERR_THR0_WR_OVERRUN: 2 (PROD: 12212, CONS: 12209) commit fail: 0
ERR_THR0_WR_OVERRUN: 3 (PROD: 12214, CONS: 12212) commit fail: 1
ERR_THR0_WR_OVERRUN: 4 (PROD: 12216, CONS: 12213) commit fail: 1
ERR_THR1_WR_OVERRUN: 5 (PROD: 12216, CONS: 12213) commit fail: 1
ERR_THR1_WR_OVERRUN: 6 (PROD: 12216, CONS: 12214) commit fail: 1
ERR_THR0_WR_OVERRUN: 7 (PROD: 12217, CONS: 12215) commit fail: 1
ERR_THR1_WR_OVERRUN: 8 (PROD: 12218, CONS: 12216) commit fail: 1
ERR_THR0_WR_OVERRUN: 9 (PROD: 12248, CONS: 12246) commit fail: 1
ERR_THR0_WR_OVERRUN: 10 (PROD: 15940, CONS: 15937) commit fail: 1
ERR_THR0_WR_OVERRUN: 11 (PROD: 15940, CONS: 15938) commit fail: 1
ERR_THR0_WR_OVERRUN: 12 (PROD: 15942, CONS: 15939) commit fail: 1
ERR_THR1_WR_OVERRUN: 13 (PROD: 15942, CONS: 15940) commit fail: 1
ERR_THR0_WR_OVERRUN: 14 (PROD: 15943, CONS: 15941) commit fail: 1
ERR_THR1_WR_OVERRUN: 15 (PROD: 15944, CONS: 15942) commit fail: 1
ERR_THR1_WR_OVERRUN: 16 (PROD: 15944, CONS: 15942) commit fail: 1
ERR_THR0_WR_OVERRUN: 17 (PROD: 15945, CONS: 15943) commit fail: 1
ERR_THR1_WR_OVERRUN: 18 (PROD: 15974, CONS: 15972) commit fail: 1

Stopping acquisition!
Overflows: 18 (PROD: 16380, CONS: 16379)

 

There should have been a total of 16384 CONS and PROD events, so I lost 4 PROD, 5 CONS, had one call to CyU3PDmaMultiChannelCommitBuffer fail, and had 18 buffer overflow interrupts.  PROD/CONS events keep happening even when the host is not responding, but only the first call to CyU3PDmaMultiChannelCommitBuffer returns an error (??).  

I think the most reliable way to detect an error condition is to count CYU3P_PIB_ERR_THRX_WR_OVERRUN events.  These appear to happen very quickly, before even CyU3PDmaMultiChannelCommitBuffer fails.  Is there any way to determine how much data was lost during each overflow?  By counting CYU3P_PIB_ERR_THRX_WR_OVERRUN I can detect the first buffer committed after the overflow, but I actually don't know how many samples were lost and thus its hard to know how much data was corrupted.  

0 Likes

Hello,

From the debug prints, I can see that CYU3P_PIB_ERR_THRX_WR_OVERRUN are seen even when commit fail is not seen. Can you please let me know if CYU3P_PIB_ERR_THRX_WR_OVERRUN  is seen even when the host is reading the data continuously or is it triggered only when the host reads the data slowly from the device.

Also, confirm if you are following the KBA Invalid Sequence Error in Multi-Channel Commit Buf... - Cypress Developer Community for recovering from commit buffer failure 

Regards,
Rashi
0 Likes

Sorry for the long delay.  I only get CYU3P_PIB_ERR_THRX_WR_OVERRUN when data is available but no buffer is posted to the CyAPI.  I tested monitoring for CYU3P_PIB_ERR_THRX_WR_OVERRUN and that appears very reliable for detecting overflows, but again, it doesn't help me figure out how much data was lost.  So far the best I've been able to do is reset the system and begin from last known good state.  


Best way would be to simply include the current value of the DATA/ADDR/CTRL counters in each DMA buffer, but I don't think that is possible.

0 Likes

Hello,

Best way would be to simply include the current value of the DATA/ADDR/CTRL counters in each DMA buffer, but I don't think that is possible

>> Yes, the counter values cannot be read

The best way to avoid the overrun errors will be using DMA flags that will show the status if the DMA buffer that is being read/written. Please let me know if the DMA Flags can be used in your application.

Please let me know how are you planning to use the information about the number of bytes that caused over run. This will help me to suggest better solution.

Regards,
Rashi
0 Likes