WaitForXfer returning fewer bytes than requested

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
mgiacomelli
Level 3
Level 3
25 replies posted 10 questions asked 25 sign-ins

I've written a more or less working GPIF state machine that uses two threads in ping-pong configuration to continuously acquire data after receiving a trigger.  I've implemented cases where the data is smaller, equal to and larger than the buffer size, and testing with a logic analyzer I think it is working as I expect. 

However, I am having inconsistent results if I set the DMA buffer size to be slightly different than the data size.  For example, I wrote a simple application that configures 20 triggers of 4096 samples (16383 bytes) into two CYUSB buffers of 10 triggers each.  This application works great if the DMA buffer size is 16384 bytes, but if I set it 16 bytes smaller to 16368 bytes, I get:

 

/* Size of DMA buffers used by the application. */
#ifndef CY_FX_DMA_BUF_SIZE
#define CY_FX_DMA_BUF_SIZE              (16368) //not power of 2
#endif

/* Number of DMA buffers to be used on the channel. */
#ifndef CY_FX_DMA_BUF_COUNT
#define CY_FX_DMA_BUF_COUNT             (2)
#endif

 

 

 

CyU3PDmaMultiChannelConfig_t dmaMultiConfig;
CyU3PMemSet ((uint8_t *)&dmaMultiConfig, 0, sizeof (dmaMultiConfig));
dmaMultiConfig.size           = CY_FX_DMA_BUF_SIZE;
  dmaMultiConfig.count          = 4; // CY_FX_DMA_BUF_COUNT;
  dmaMultiConfig.validSckCount  = 2;
  dmaMultiConfig.prodSckId [0]  = (CyU3PDmaSocketId_t)CY_U3P_PIB_SOCKET_0;
  dmaMultiConfig.prodSckId [1]  = (CyU3PDmaSocketId_t)CY_U3P_PIB_SOCKET_1;
  dmaMultiConfig.consSckId [0]  = (CyU3PDmaSocketId_t)(CY_U3P_UIB_SOCKET_CONS_1);
  dmaMultiConfig.prodAvailCount = 0;
  dmaMultiConfig.prodHeader     = 0;
  dmaMultiConfig.prodFooter     = 0;
  dmaMultiConfig.consHeader     = 0;
  dmaMultiConfig.dmaMode        = CY_U3P_DMA_MODE_BYTE;
  //dmaMultiConfig.cb 			= NULL;
  //dmaMultiConfig.notification   = 0;
  dmaMultiConfig.notification   =  CY_U3P_DMA_CB_PROD_EVENT;//| CY_U3P_DMA_CB_CONS_EVENT;//CY_U3P_DMA_CB_PROD_EVENT; //
  dmaMultiConfig.cb             = GpifToUsbDmaCallbackMulti;
  apiRetStatus = CyU3PDmaMultiChannelCreate (&glDmaChHandleMulti, CY_U3P_DMA_TYPE_MANUAL_MANY_TO_ONE,
	 // apiRetStatus = CyU3PDmaMultiChannelCreate (&glDmaChHandleMulti, CY_U3P_DMA_TYPE_AUTO_MANY_TO_ONE,
		  &dmaMultiConfig);
  if (apiRetStatus != CY_U3P_SUCCESS)
  {
	  /* Error handling */
	  CyU3PDebugPrint (4, "DMA Channel Creation Failed, Error Code = %d\n", apiRetStatus);
	  CyFxAppErrorHandler (apiRetStatus);
  }

  /* Set DMA Channel transfer size */
  apiRetStatus = CyU3PDmaMultiChannelSetXfer (&glDmaChHandleMulti, CY_FX_GPIFTOUSB_DMA_TX_SIZE, 0);
  if (apiRetStatus != CY_U3P_SUCCESS)
  {
	  CyU3PDebugPrint (4, "CyU3PDmaChannelSetXfer failed, Error code = %d\n", apiRetStatus);
	  CyFxAppErrorHandler(apiRetStatus);
  }

 

 

 

Cypress FX3 USB StreamerExample Device - 1204 - 241
found our device at index 2
interfaces: 1
endpoints: 4
endpoint type: CONT  IN   512
endpoint type: Bulk  IN   16384
endpoint type: Bulk  OUT  4096
endpoint type: Bulk  IN   4096

Configuring FX3 acquisition with parameters:
Samples per trigger:  4096  (16384 bytes)
Number of triggers: 20
Total acquisition size: 327680 bytes
CYUSB bufferSize: 163840 (0.488281 percent of max)
Total CYUSB buffers per acquisition: 2 (10.000000 triggers per buffer)

Beginning Transfer...

0: Got 16368 bytes (0.999023 triggers)!
1: Got 16368 bytes (0.999023 triggers)!
ERROR:  requested 327680 bytes but got 32736 bytes!

 

 

So although I requested 163840 bytes to BeginDataXfer/WaitForXfer/FinishDataXfer, it actually transferred exactly 1 DMA buffer per call.   Concurrently, the FX3 log gives:

 

 

Restarting the GPIF !
No Error :26
No Error :26
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 1
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 2
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 3
CYU3P_PIB_ERR_THR1_WR_OVERRUN: 4
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 5
CYU3P_PIB_ERR_THR1_WR_OVERRUN: 6
CYU3P_PIB_ERR_THR1_WR_OVERRUN: 7
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 8
CYU3P_PIB_ERR_THR1_WR_OVERRUN: 9

 

 

So the first two DMA buffers are transferred, but the next 9 are ignored.  Watching on the logic analyzer, I see 10 triggers received and the GPIF successfully entering the completion state.

 

First, I had thought that WaitForXfer would only be signaled when it has received the requested amount of data.  Here however it returns exactly 1 DMA buffer, or about 10% of the requested amount.  What determines how much data is available when WaitForXfer is signaled?

Second, why does it behave differently with different DMA buffer sizes?  For example, if I increase the buffer size by 16 bytes:

 

Samples per trigger:  4096  (16384 bytes)
Number of triggers: 20
Total acquisition size: 327680 bytes
CYUSB bufferSize: 163840 (0.488281 percent of max)
Total CYUSB buffers per acquisition: 2 (10.000000 triggers per buffer)

Beginning Transfer...

0: Got 163840 bytes (10.000000 triggers)!
1: Got 163840 bytes (10.000000 triggers)!
acquisition succeeded!

 

 

It now returns the full requested amount of data.  Is there something in the GPIF state machine that tells the CYAPI to transfer less data at a time?

0 Likes
1 Solution

Hello,

 Is that an acceptable way to request data that does not fit neatly into 1024 byte increments?  

>> Yes, the host is expected to send the USB request in multiple of maximum packet size reported by the USB device through its descriptors. If the data on FX3 is not multiple of maximum packet size and if the host application requests the data in multiple of maximum packet size, then a short packet will be committed by FX3. In this way, all the data with FX3 will be transferred to USB host.

Regards,
Rashi

View solution in original post

0 Likes
5 Replies
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

To understand the application better, please let me know the following

- How is the flow control implemented on the GPIF interface

For example, I wrote a simple application that configures 20 triggers of 4096 samples (16383 bytes) into two CYUSB buffers of 10 triggers each

>> I understand that this is done by the Host application. Is my understanding correct?

When the DMA buffer size 16368 bytes, we see thread overrun error which is triggered when the FPGA/Master writes more data to the DMA buffer than expected i.e. > 16368. The FPGA/Master should be configured to write only 16368 bytes of data if the DMA buffer size is changed.

If possible, please share the GPIF state machine for us to check the implementation.

So the first two DMA buffers are transferred, but the next 9 are ignored.  Watching on the logic analyzer, I see 10 triggers received and the GPIF successfully entering the completion state.

>> I understand that the thread overrun errors are seen for this case. Is that correct? If yes, we need to confirm that we are receiving expected data from GPIF interface when DMA buffer size is reduced by 16 bytes.

Also, as the DMA buffer size is not multiple of the USB endpoint size i.e. 1024 bytes, the short packet will be committed. So, the BULK IN USB transfer will end if the short packet is received. So, it is recommended to use the DMA buffer size multiple of USB endpoint size.

Regards,
Rashi
0 Likes

Also, as the DMA buffer size is not multiple of the USB endpoint size i.e. 1024 bytes, the short packet will be committed. So, the BULK IN USB transfer will end if the short packet is received. So, it is recommended to use the DMA buffer size multiple of USB endpoint size.


Ok I did not realize that.  I thought it would buffer until it had a full packet, so that neatly explains question number 2, the packet being a multiple of 1024 doesn't generate the short packet.  This means I picked bad examples, so let me rephrase the problem using 1024 multiple buffer sizes.

I am observing inconsistent behavior with passing data to the PC host.  For a better example using 1024 multiple buffers, if I configure the DMA buffer to 16384 (1024*16) bytes and then try to do a 4000 sample (16000 byte) per trigger acquisition with 10 triggers:

endpoints: 4
endpoint type: CONT  IN   512
endpoint type: Bulk  IN   16384
endpoint type: Bulk  OUT  4096
endpoint type: Bulk  IN   4096
Max packet size:  16384
WARNING: requested only 5.000000 triggers per buffer, implying 2400.000000 buffe
rs per second at 12 KHz


Configuring FX3 acquisition with parameters:
Samples per trigger:  4000  (16000 bytes)
Number of triggers: 10
Total acquisition size: 160000 bytes
CYUSB bufferSize: 80000 (0.238419 percent of max)
Total CYUSB buffers per acquisition: 2 (5.000000 triggers per buffer)

Beginning Transfer...

FinishDataXfer failed on buffer 0!
Got 0 bytes!
NTSTATUS = c0000001
status string:  [state=STALLED status=UNKNOWN]
ERROR:  requested 160000 bytes but got 0 bytes!

 

The UART gives:

 

Changing number of triggers per acquisition to A (10)!
Changing number of samples per trigger to FA0 (4000)!
Restarting the GPIF !
No Error :26
No Error :26

 Buffer_Tracker: prod : 0, cons 0, commit: 0
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 1
CYU3P_PIB_ERR_THR0_WR_OVERRUN: 2

 Buffer_Tracker: prod : 8, cons 0, commit: 0

 Buffer_Tracker: prod : 8, cons 0, commit: 0

 Buffer_Tracker: prod : 8, cons 0, commit: 0

 Buffer_Tracker: prod : 8, cons 0, commit: 0

Stopping acquisition!

 

So the call to BeginDataXfer/Wait/Finish transfer errors out returning 0 bytes, but the UsbDmaCallback reports that buffers are being produced but not consumed, I guess because the transfer errored out.  Furthermore, the number of buffers produced is only 8 (should be 10).  I do see 10 triggers generated on the logic analyzer.   What I do not understand is why the transfer errors out if the data is being produced?



@Rashi_Vatsa wrote:

 

To understand the application better, please let me know the following

- How is the flow control implemented on the GPIF interface


The GPIF configuration is complex, but basically, it is reading an ADC.  When it gets a trigger it acquires the requested number of samples, switching threads as needed to capture more than the DMA buffer size.  The control counter is used to track the current position in the buffer, the data counter the number of samples, and the address counter the number of triggers.

Here is the file if you want to review:  https://imstore.circ.rochester.edu/continuous_read_arbitrary_buffer_size.cydsn.zip

Here is a diagram showing the flow for different buffer sizes:

https://imstore.circ.rochester.edu/diagramv2.PNG

The blue case works perfectly, but the others occasionally fail for certain combinations.


@Rashi_Vatsa wrote:

 

>> I understand that this is done by the Host application. Is my understanding correct?

Yes, the host application sends the values for the data counter (number of samples) and the address counter (number of triggers).  


When the DMA buffer size 16368 bytes, we see thread overrun error which is triggered when the FPGA/Master writes more data to the DMA buffer than expected i.e. > 16368. The FPGA/Master should be configured to write only 16368 bytes of data if the DMA buffer size is changed.

Every  DMA buffer size bytes I ping pong between Th0 and Th1.  That should be sufficient to change DMA buffers, correct?


Thanks so much for your help.  

0 Likes

Hello,

I am observing inconsistent behavior with passing data to the PC host.  For a better example using 1024 multiple buffers, f I configure the DMA buffer to 16384 (1024*16) bytes and then try to do a 4000 sample (16000 byte)

>> Please refer to this thread  Solved: FX3 Frame size must be divisible by 1024, How to a... - Cypress Developer Community   . It is expected that the host sends the USB requests in multiple of maximum packet size. 

Every  DMA buffer size bytes I ping pong between Th0 and Th1.  That should be sufficient to change DMA buffers, correct?

>> From this I understand, that the master will write the no. of bytes based on DMA buffer and then switch the thread, is that correct? 

Regards,
Rashi
0 Likes

That would explain what I am seeing.   I tested padding the request to the nearest multiple of 1024 and that does seem to work.  Is that an acceptable way to request data that does not fit neatly into 1024 byte increments?  

0 Likes

Hello,

 Is that an acceptable way to request data that does not fit neatly into 1024 byte increments?  

>> Yes, the host is expected to send the USB request in multiple of maximum packet size reported by the USB device through its descriptors. If the data on FX3 is not multiple of maximum packet size and if the host application requests the data in multiple of maximum packet size, then a short packet will be committed by FX3. In this way, all the data with FX3 will be transferred to USB host.

Regards,
Rashi
0 Likes