FX3 DMA packet loss

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Magnus_P
Level 2
Level 2
First like received First like given 10 sign-ins

We are using the the FX3 chip for a camera design.
The code we use is based on the AN75779 example design.
Basically, it works well, but we have a problem with that single DMA packets from the GPIF interface sometimes "disappear", resulting in a mismatch between the packets coming from the two interleaved DMA channels. In the picture we then get a horizontal striping effect that persists until the next time we get a packet loss and everything jumps back to normal.

This packet loss happens at any time, but is much more likely to happen right in the beginning, when streaming is started.

We see two cases, resulting in the same sympthom:
- First DMA packet (from GPIF) in the image has size zero, all frames after that will have the striping effect.
- One packet anywhere in the frame is lost, resulting in one frame being one packet shorter. All frames after that will have the striping effect.

We have in the CyFxUvcApplnDmaCallback() function implemented a log-function that recodrs every packet, its length, which buffer is used and also the consumer packet status.
If anything then is detected to be unusual, we print the history to the terminal so we can scrutinize the details. This is done in the non-timing critical part of the code.

We have spent quite a bit of time analyzing and also measuring with oscilloscope (indicating different actions in the code at extra HW pins) to try to understand what goes wrong, but it seems to be something internal in the DMA handling, outside our control.

Do you have any hints on what can cause this and how it can be avoided?

Best Regards,

Magnus

0 Likes
11 Replies
JayakrishnaT_76
Moderator
Moderator
Moderator
First question asked 1000 replies posted 750 replies posted

Hello Magnus,

Please let me know if by "packet", you mean a DMA buffer itself. If not, then are you referring to a USB packet as such?

Please share the following details so that we can debug the issue:

1. UART debug logs. This is to check if we have a commitbuffer failure

2. Increment a global uint32_t variable as soon as a producer event is received. The variable should be initialized to 0 at the start of application. This variable should be added to any unused fields in the UVC header. After this, while streaming, capture the wireshark trace and share it with us. By looking into the UVC header, we can understand if any buffers were missed or not.

3. Is the GPIF II state machine same as AN75779 project? Is the interface of 32 bit or 16 bit data bus?

4. What is the resolution and frame rate used?  

Best Regards,
Jayakrishna
0 Likes
Magnus_P
Level 2
Level 2
First like received First like given 10 sign-ins

Hi!
I mean DMA buffers.

1. No, this commit buffer errors do not happen at the same time as this error, but we see those also ...

2. Yes, we can do that in in a week or so ... vacation right now on no HW available.

3. Yes, the state machine is a slightly modified version of the AN75779. We use either 16 or 8 bit bus width.

4. Resolution i 640x400 at 190 frames/s but we have used lower framespeeds also, still seeing the problem.

 

Some more info:
We have both an 8-bit and a 16-bit version, both show the same behaviour, although a bit more common for the 8-bit version. 
For the 8-bit version, each DMA buffer holds 16384  bytes. 16 bytes are reserved for the header, so 16368 bytes are left for image data.
For each incoming buffer in the CyFxUvcApplnDmaCallback() function (when the event type is CY_U3P_DMA_CB_PROD_EVENT), we log the buffer length (the value of dmaBuffer.count).
We then see that when this happens, we have one of two cases:

- First DMA buffer for an image frame had zero length
or
- Last DMA buffer (size 4192) received too early (meaning some buffers are lost)
(But the size of the last DMA buffer is still 4192, so it is complete DMA buffers we have lost)


Best Regards,
Magnus

0 Likes

Hi Magnus,

Please find my follow up questions below:

1. Does the issue happen after a commit buffer failure? Or is it also seen before the commit buffer failure?

2. Is the change in bus width the only modification in the GPIF II project? Or have you made any other modifications? If you have any modifications, then is it possible to share the GPIF II project with us?

3. What is the bits per pixel? We want this information to calculate the data rate.

4. Please check if you have any PIB errors. You can refer to AN75779 to understand how the PIB callback can be registered and how the callback function should be written. The callback function is registered using the API CyU3PPibRegisterCallback (). Please let me know if you have any queries on this.

5. Also, please share the result of the test that was requested in my previous response when the hardware is available.

Best Regards,
Jayakrishna
0 Likes

Hi!

Now I'm back on it ....

With PIB-errors, do you mean the handling that is enabled by the define "BACKFLOW_DETECT"?

 

0 Likes

Hello,

Yes, the callback CyFxUvcAppPibCallback () is invoked in case of a PIB error. You can parse the arguments of the callback to understand the type of error.

Best Regards,
Jayakrishna
0 Likes
lock attach
Attachments are accessible only for community members.

Hi,

I have checked, the error is typically not a result of any PIB error.

The easiest way to trigger it is turning on and off streaming several times (from the application in the PC). From my logs I can then see that when the error happens, the first producer interrupt is trigger by a packet of length zero. That makes it go out of sync, and the second producer interrupt is then entered with the packet that normally comes as number two. The third interrupt is entered with the packet that normally comes as number one, and so on ...
I have turned on a test image in the sensor so you can easily see which image packet is which.
I have made a USB capture where it happens.
First look at for example the frame that starts at 12155.
You can see that the image data look like this:

00 01 00 01 00 01 ...

Next packet has this image data:

00 00 00 04 00 00 00 04 ...

If you now look at the frame that starts at 15935 you will see that the packets arrive in the wrong order.

Some details:
Each frame consists of 31 packets with 16368 image bytes. The last packet, packet 32, has 4592 bytes. This makes a 640x400 image with 16-bit pixels.

You can see the producer interrupt counter (as you suggested above) in byte 4:5 of the UVC header and a frame counter in byte 10:11.

After the error has been triggered, it stays out of sync until it happens again, or some other error is triggered so the application in the FX3 is restarted.
We of course would like to find the root cause and eliminate the error, but a first step would be to at least be able to detect it, so we can restart/reset the streaming in the FX3.

 

Best Regards,

Magnus

0 Likes

Hello Magnus,

I just referred to the traces. I also found the packets that were transferred in the incorrect order. 

At entry 15919, I find that there is a ZLP transferred. Can you please share the complete project along with the GPIF II state machine for us to check? This is because we want to check the modifications made on the GPIF II project as well as the code used to start and stop the video streaming in firmware. The transmission of a ZLP ideally should not take place as the GPIF II block samples the data based on the control signals received from the image sensor. So, the problem could be due to the image sensor too. We would also like to see the complete UART debug logs while streaming the video. Please share them too with us.

Also, I believe that you have multiple boards with you. Can you test the firmware on different boards and let me know if the problem is seen on all the boards or not? This is to check if there are any possible hardware issues.

Best Regards,
Jayakrishna
0 Likes

Hi!

I don't want to put our code in public, is there a way I can share it with you only?

When it comes to the error, as I wrote, the simplest way to trigger it is as I described below. But it will also happen spontaneously, if we just let the system run, and in these cases there is not a ZLP but instead just missing packets in the incoming DMA. 
These are harder to catch with a Wireshark recording, so that was why I as a first step provided you with the ZLP version. But as these also are errors, it would be a good start to get rid of these - and I guess that error is linked to the other one.

Yes, we have three version of boards - that are quite different but all based on the same image sensor. 
First version is a single camera interface implemented as a piggy back board on your eval board.
Secodn version is a completely new design - from scratch - with dual camera interfaces. An small FPGA inbetween the two cameras and the FX3 helps with the synchronization of data. And third is a simplfied version of the dual camera, that has only one camera interface and no FPGA.

All three versions show the same error.
The frequency of the error seems to be very dependant on details in the software, and for some builds the error is very unlikely, while for other builds it is more frequent.
Debug builds and releas builds show roughly the same error ferquency.

Best Regards,

Magnus

0 Likes

Hello Magnus,

When you say for "some builds", what exactly was meant? Can you please elaborate?

Also, I have sent a private message to you having my email ID. You can share the firmware, UART debug logs and corresponding Wireshark traces to that email id. 

Best Regards,
Jayakrishna
0 Likes
Magnus_P
Level 2
Level 2
First like received First like given 10 sign-ins

I have sent you the code on your eamil.

With "some builds" I mean two things: 

- Sometimes just rebuilding will change the likelyhood for the error to occur

- Very small changes to the (non timing critical) code results in large variations in error frequency

My interpretation is that something in the handling is very timing critical, even if it shouldn't.
And as said before, we have not found a (simple) way to detect this problem from the code - else we could simply restart the DMA handling to fix it.

Regards,

Magnus

 

0 Likes

Hello Magnus,

Can you please share the changed that you made that resulted in large variations in error frequency? This will help us to debug the problem faster. We also need to see the UART debug logs.

Best Regards,
Jayakrishna
0 Likes