CyUSB buffer size and transfer rate

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
mgiacomelli
Level 3
Level 3
25 replies posted 10 questions asked 25 sign-ins

I have been testing out various configurations of the FX3 and CyUSB driver to optimize throughput, and I've found something I don't fully understand. 

First I installed the modified driver here to enable 32MB buffer sizes on Windows 10:  https://community.cypress.com/t5/USB-Superspeed-Peripherals/Maximum-transfer-size-for-BeginDataXfer-...

Then I set up a GPIF firmware that grabs 32 bit samples at 80 MHz continually using an external function generator as clock.  DMA transfers were configured with 2 threads, each having 2 buffers holding 13 1024 byte USB packets.  I then stream data across the USB bus to a Windows 10 host using different buffer sizes and the async API in CyUSB.  To make sure my software doesn't influence throughput, all buffers are posted to the driver before the transfer starts and the software immediately discards buffers to minimize CPU load.  After the transfer I check how many times the overflow flag was asserted and how much data was actually transferred. 

I expected that smaller buffers should have more overhead and that larger would be slightly better.  When I tested it however, I found that larger buffers seem to have lower throughput.  For example:

832 MB using 64 individual 13MB buffers:

 

Configuring FX3 acquisition with parameters:
Samples per trigger:  6656  (26624 bytes)
Number of triggers: 32768
Total acquisition size: 872415232 bytes (832 MB)
CYUSB bufferSize: 13631488 (40.625001 percent of max), 13312.000000 USB packets (32 total buffers)
Total CYUSB buffers per acquisition: 64 (512.000000 triggers per buffer)
0: Got 13631488 bytes (512.000000 triggers) with buffer 0 (0.000000)!
...
63: Got 13631488 bytes (512.000000 triggers) with buffer 32256 (63.000000)!
Overflows: 0
Total FX3 DMA transfers Produced: 16384
acquisition succeeded!

 

After 10 repetitions, not a single overflow and all data is accounted for.

832 MB using 32 individual 26MB buffers:

 

Configuring FX3 acquisition with parameters:
Samples per trigger:  6656  (26624 bytes)
Number of triggers: 32768
Total acquisition size: 872415232 bytes (832 MB)
CYUSB bufferSize: 27262976 (81.250002 percent of max), 26624.000000 USB packets (32 total buffers)
Total CYUSB buffers per acquisition: 32 (1024.000000 triggers per buffer)

Beginning Transfer...

0: Got 27262976 bytes (1024.000000 triggers) with buffer 0 (0.000000)!
...
30: Got 27262976 bytes (1024.000000 triggers) with buffer 30720 (30.000000)!
WaitForXfer failed!
Overflows: 198
Total FX3 DMA transfers Produced: 16336
ERROR:  requested 872415232 bytes but got 845152256 bytes!
acquisition failed!

 

After 10 repetitions, not a single one was able to do the whole transfer without 160-200 overflows. 

Since the data rate is identical in both cases, and all host-side buffers are preallocated,  I am surprised that there is such a large difference in throughput.  I did some more testing, and it seems that buffers above about 16 MB begin to have overflows, and the number of overflows per second increases as the buffer size increases.  What is the functional difference between a 10 or 15 MB buffer and a 20 or 30MB buffer?  I had thought that 1024 byte USB packets are simply being written sequentially into the buffer, but maybe I am missing something about how the driver works?

0 Likes
1 Solution
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

From the description, I understand that the data transfer is done from GPIF to USB.  Is that correct?

DMA transfers were configured with 2 threads, each having 2 buffers holding 13 1024 byte USB packets.  

>> I understand that 13 KB is the DMA buffer size on the FX3 side. Please let me know the endpoint size and the burst length used in this case.

Also, from the description I understand that when the Buffer size on the host application size is increased , the overflow is seen. Is my understanding correct? If yes, please let me know how are these overflows calculated. Also, from the logs I see that waitforexfer fails. Please let me know the error code for the failure.

We recommend to use the DMA buffer (on FX3) as multiple of the USB endpoint size. This will give better throughput. For example usually, the DMA buffer size (on FX3) will be 16 KB, 32KB for the case where the USB endpoint size is 16 KB (16: burst, 1024:packet size)

Regards,
Rashi

View solution in original post

0 Likes
5 Replies
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

From the description, I understand that the data transfer is done from GPIF to USB.  Is that correct?

DMA transfers were configured with 2 threads, each having 2 buffers holding 13 1024 byte USB packets.  

>> I understand that 13 KB is the DMA buffer size on the FX3 side. Please let me know the endpoint size and the burst length used in this case.

Also, from the description I understand that when the Buffer size on the host application size is increased , the overflow is seen. Is my understanding correct? If yes, please let me know how are these overflows calculated. Also, from the logs I see that waitforexfer fails. Please let me know the error code for the failure.

We recommend to use the DMA buffer (on FX3) as multiple of the USB endpoint size. This will give better throughput. For example usually, the DMA buffer size (on FX3) will be 16 KB, 32KB for the case where the USB endpoint size is 16 KB (16: burst, 1024:packet size)

Regards,
Rashi
0 Likes

@Rashi_Vatsa wrote:

Hello,

From the description, I understand that the data transfer is done from GPIF to USB.  Is that correct?

DMA transfers were configured with 2 threads, each having 2 buffers holding 13 1024 byte USB packets.  

>> I understand that 13 KB is the DMA buffer size on the FX3 side. Please let me know the endpoint size and the burst length used in this case.


I'm sorry, I was confusing samples and bytes.  Each DMA buffer is 13x1024 samples, or exactly 52KB.  There are 2 threads of 2 buffers, or 208KB.  Thus most of the available SRAM is being used for buffer. 

I tested using 16 as the endpoint size and also setting it to 13 (so a multiple of the DMA buffer size).  This did not change the result. 

Regarding the end point size, the maximum packet size (BulkInEndPt->MaxPktSize) is reported as 16384 bytes.  Is that what you mean by the end point size?

 

Also, from the description I understand that when the Buffer size on the host application size is increased , the overflow is seen. Is my understanding correct? If yes, please let me know how are these overflows calculated. Also, from the logs I see that waitforexfer fails. Please let me know the error code for the failure.

 

I count the number of CYU3P_PIB_ERR_THR0_WR_OVERRUN and CYU3P_PIB_ERR_THR1_WR_OVERRUN events.  In addition, since the transfer is of fixed size, I keep track of how many bytes were actually received by the host.  For example, when I configured the CyUSB buffer to 26 MB (26624 packets x 1024 bytes/packet), I had 198 overflow events and as a result lost the entire last 26 MB buffer.  

 

Which logs should I check when waitforxfer fails?  I had not realized there might be additional debug info.  Possibly I am overlooking something obvious.  

 


We recommend to use the DMA buffer (on FX3) as multiple of the USB endpoint size. This will give better throughput. For example usually, the DMA buffer size (on FX3) will be 16 KB, 32KB for the case where the USB endpoint size is 16 KB (16: burst, 1024:packet size)


In my case my 52KB buffer gives 3.25 times the max packet size (burst of 16) or 3 times the max packet size (fx3 configured to burst of 13).  If it should be a multiple of the host max packet size (rather than fx3 burst length), then this will be tricky for me since this is an image sensor application and the number of pixels does not divide evenly into 16384.  Let me know what you think, perhaps I could rig a way to ROI the readout to get a power of 2 number of pixels per row for testing purposes.  

 

0 Likes

Hello,

Please find my comments below:

I count the number of CYU3P_PIB_ERR_THR0_WR_OVERRUN and CYU3P_PIB_ERR_THR1_WR_OVERRUN events.

>> Are these overrun seen when the buffer size on the USB host application side increased to 26 MB from 13 MB? Does this also mean that the host applications asks for 26 MB of data from the device in one request?

 tested using 16 as the endpoint size and also setting it to 13 (so a multiple of the DMA buffer size).  This did not change the result.

>> Can you please try to configure the DMA buffer size as 48 KB and endpoint size as 16 KB and let me know the results

Regarding the end point size, the maximum packet size (BulkInEndPt->MaxPktSize) is reported as 16384 bytes.  Is that what you mean by the end point size?

>> Yes

Which logs should I check when waitforxfer fails?  I had not realized there might be additional debug info.  Possibly I am overlooking something obvious. 

>> Can you please let me know how is the failure of waitforxfer handled in the host application? If possible, please share the snippet of the host app where the data transfers are done.

You can refer to following app note https://www.cypress.com/documentation/application-notes/an86947-optimizing-usb-30-throughput-ez-usb-... which discusses how the maximum achievable USB 3.0 throughput also depends on critical factors such as host PC controller type, operating system, and USB design (transfer type and buffer sizes).

To confirm if the driver from the pointed thread bind to the device. Please check driver version (i.e. 1.2.3.25) in the driver details of the device in the device manager.

Regards,
Rashi
0 Likes

@Rashi_Vatsa wrote:

Hello,

Please find my comments below:

I count the number of CYU3P_PIB_ERR_THR0_WR_OVERRUN and CYU3P_PIB_ERR_THR1_WR_OVERRUN events.

>> Are these overrun seen when the buffer size on the USB host application side increased to 26 MB from 13 MB? Does this also mean that the host applications asks for 26 MB of data from the device in one request?


Yes they are seen only if the buffer size is increased.  I do USBDevice->BulkInEndPt->BeginDataXfer(buffers[i], bufferSize, &inOvLap[i]); where bufferSize is 13 or 26 MB.  13 works perfectly, 26 has overflows even though the GPIF is generating the same amount of data per second.  I change nothing else.  I tested intermediate values and around 15 or 16 MB per transfer the issue begins.  


>> Can you please try to configure the DMA buffer size as 48 KB and endpoint size as 16 KB and let me know the results

My GPIF state machine works on groups of 6656 samples (simulating an image sensor) which are packed two at a time into a DMA buffer, so there is no easy way to generate 48KB DMA buffers.  Could you elaborate on what you would like to test?  Maybe I can think of some way to do it.


>> Can you please let me know how is the failure of waitforxfer handled in the host application? If possible, please share the snippet of the host app where the data transfers are done.

Since this is a test program, I wait on the hEvent and simply abort if it ever fails to return the expected data.  Here is the program:  https://pastebin.com/MxtQ413q

The problem can be triggered by changing line 103 from "totalTransfers = numTrigger / 512;" to "totalTransfers = numTrigger / 1024;" as that will double the size of each request from 13 to 26 MB.  

 


To confirm if the driver from the pointed thread bind to the device. Please check driver version (i.e. 1.2.3.25) in the driver details of the device in the device manager.


driver.png

Let me know if there is some other version I should use.

 

0 Likes

Hello,

Please find my comments below:

13 works perfectly, 26 has overflows even though the GPIF is generating the same amount of data per second.  I change nothing else.

>> If the DMA buffer size of FX3 is not changed and only the buffer size on host application side is increased, please let me know if there are commit buffer failures (CyU3PDmaMultiChannelCommitBuffer returns error code 0x47) seen on the FX3 side along with the overflows. Also, let me know how are these errors handled in your firmware.

If yes, this error is seen when the host is slow to read the data from FX3. In this case, please share the USB traces (can be captured using wireshark) for both the cases. This will help us to understand if there are any failures seen when the buffer size is increased.

Regards,
Rashi
0 Likes