[FX3] GPIF extra padding on Master Read

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
dmickels
Level 1
Level 1
10 sign-ins 5 replies posted 5 sign-ins

Working with two FX3's in a back to back configuration, and I am wondering if I am missing something.

When I push a buffer from Master->Slave, the correct amount of bytes are received with no additional padding added. However, if I send from Slave->Master, GPIF seems to append zeros to fill out the rest of my buffer, which results in me having to use something like the following to remove them, significantly impacting performance.

 

 

i = input->buffer_p.count - 1;
while(input->buffer_p.buffer[i] == 0x00){
    i--;
}
//input->buffer_p.count always returns 4096 due to GPIF padding. Find actual size, with benefit to full buffers.
CyU3PDmaChannelCommitBuffer (chHandle, i + 1, 0);

 

 

 Again, this method works, but I would rather use Auto channels for the speed difference, avoiding CPU modification entirely.

I am aware of the 4 byte alignment padding, where sending 6 bytes will add 2 bytes, giving a total of 8 received bytes. This behavior is fine. The issue is when I want to send a short packet, say 84 bytes. When the Master side receives this, I receive a buffer with its count at 1024, and reading out the data shows my 84 bytes, followed by 940 zeros in the packet.

I have tried a variety of settings, but then found that this behavior appears in the provided back to back example, AN87216. Can this be avoided?

As an example:
A transfer from Master->Slave works fine, and the correct number of bytes arrive on the other side.

dmickels_0-1637020616199.png

But from Slave->Master, you can see all of the padding being added.

dmickels_1-1637020698990.png

 

0 Likes
1 Solution
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

Glad to hear that the zero padding problem is resolved when the data counters at both master and slave side are same.

For the 1024 bytes issue, please refer to this KBA https://community.infineon.com/t5/Knowledge-Base-Articles/Data-sent-from-Host-over-USB-is-not-Commit...

Please let me know if any query on this

Regards,
Rashi

View solution in original post

0 Likes
9 Replies
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

I understand that you are using the default firmware shared with AN87216. Please confirm.

If yes, the default firmware allows to send short packets (for eg. 84 bytes) to master. Please refer to section 6.3 of the app note which mentions that a short packet  is identified when there is no more data (FLAG C asserted)in the DMA buffer of slave FX3 and the ADDR_CNT_HIT event is not generated.

Can you please check if CyFxApplnGPIFEventCB is called on the master side?  You can check incrementing a variable and later printing the value in for(;;) loop. 

If that doesn't work, please check the interface signals between FX3 master and FX3 Slave. Also , let me know if you are using a custom board or Cypress FX3 kit i.e. CYUSB3KIT-003

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

Hi,

I am using a modified version of AN87216, using four threads. I have attached the state diagrams below. Additionally, I am using two  CYUSB3KIT-003, connected as shown in figure 27 of the AN87216 example project.

As you can see in my state diagram, Flag C is used in handling thread 3. I have tried to create an additional flag, Flag E, but this provides the same effect as before and doesn't remove the zero padding.

I can confirm that CyFxApplnGPIFEventCB is being called, but only when I leave the transition to RD_SHORT_PKT as !ADDR_CNT_HIT. However, this then seems to corrupt threads 2 and 3, as changing this transition stops any data from being transfer over those two threads.

Using a transition of  !ADDR_CNT_HIT&!FLAG_A almost achieves what I want, if there wasn't buffers that appear to overflow into one another, however, that makes sense given what FLAG A is.

Is there a way that I can add/modify a Flag E to work in a similar fashion to how it behaves in AN87216 as that appears to be a solution if I can get it to work in the same way. I am not quite sure where Flag C is actually coming form in that example, since flags are setup to indicate socket availability, but there are only two configured sockets in AN87216.

Thank you for the help.

0 Likes
lock attach
Attachments are accessible only for community members.
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

Please explain the application is detail so that I can suggest a proper solution. 

From the Master state machine, I see that Flag C  is not used to either Drive the data out from master or read the data from slave. Please let me know why are four threads used.

Please let me know if you have modified the FX3 slave firmware to add the DMA channels associated with the 4 threads. 

 

Using a transition of  !ADDR_CNT_HIT&!FLAG_A almost achieves what I want, if there wasn't buffers that appear to overflow into one another, however, that makes sense given what FLAG A is.

>> I didn't understand it. Can you please explain this better. Please use !ADDR_CNT_HIT&!FLAG_A as the transition equation.

 I am not quite sure where Flag C is actually coming form in that example, since flags are setup to indicate socket availability, but there are only two configured sockets in AN87216.

>> FLAG C in the default state machine is the watermark flag and not DMA ready flag for thread 1. For more details on DMA flags, please refer to AN65974

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

For reference, the four threads are needed as I have two threads setup to forward the bulk in/out lines from one device to the other as setup in the example project, as well as two additional threads being used to forward data to an interrupt endpoint on each side in response to a vendor request. Essentially allowing me to send and Interrupt to the opposite device host when I receive a Vendor/Class request and vice versa.

DMA Channels configured as follows:

 

#define CY_FX_DMA_BUF_COUNT      (3)                       /* Master channel buffer count */
#define CY_FX_DMA_TX_SIZE        (0)                        /* DMA transfer size is set to infinite */
#define CY_FX_THREAD_STACK       (0x0400)                   /* Master application thread stack size */
#define CY_FX_THREAD_PRIORITY    (8)                        /* Master application thread priority */

#define CY_FX_INTR_BUF_COUNT      (16)                       /* Master channel buffer count */
#define CY_FX_INTR_TX_SIZE        (64)                       /* DMA transfer size is set to 64 */

#define CY_FX_EP_PRODUCER               0x02    /* EP 1 OUT */

#define CY_FX_EP_CONSUMER               0x81    /* EP 1 IN */
#define CY_FX_EP_CONSUMER_INTR          0x83    /* EP 2 OUT */

#define CY_FX_PRODUCER_USB_SOCKET        CY_U3P_UIB_SOCKET_PROD_2    /* Socket 1 is producer */
#define CY_FX_CONSUMER_USB_SOCKET        CY_U3P_UIB_SOCKET_CONS_1    /* Socket 1 is consumer */
#define CY_FX_INTERRUPT_USB_SOCKET       CY_U3P_UIB_SOCKET_CONS_3    /* Socket 2 is producer */

/* Used with FX3 Silicon. */
#define CY_FX_PRODUCER_PPORT_SOCKET    CY_U3P_PIB_SOCKET_0    /* P-port Socket 0 is producer */
#define CY_FX_CONSUMER_PPORT_SOCKET    CY_U3P_PIB_SOCKET_1    /* P-port Socket 1 is consumer */
#define CY_FX_INTERRUPT_PRODUCER_PPORT_SOCKET    CY_U3P_PIB_SOCKET_2    /* P-port Socket 2 is producer */
#define CY_FX_INTERRUPT_CONSUMER_PPORT_SOCKET    CY_U3P_PIB_SOCKET_3    /* P-port Socket 3 is consumer */

/* Burst length in 1 KB packets. Only applicable to USB 3.0. */
#define CY_FX_EP_BURST_LENGTH          (16)
/* Multiplication factor used when allocating DMA buffers to reduce DMA callback frequency. */
#define CY_FX_DMA_SIZE_MULTIPLIER      (2)

 

 

 

/* Create a DMA Manual(Auto) Channel between four sockets of the U port.
     * DMA size is set based on the USB speed. */
    dmaCfg.prodSckId = CY_FX_PRODUCER_USB_SOCKET;
    dmaCfg.consSckId = CY_FX_CONSUMER_PPORT_SOCKET;
    dmaCfg.dmaMode = CY_U3P_DMA_MODE_BYTE;
    dmaCfg.notification = 0;
    dmaCfg.cb = NULL;
    dmaCfg.prodHeader = 0;
    dmaCfg.prodFooter = 0;
    dmaCfg.consHeader = 0;
    dmaCfg.prodAvailCount = 0;

    apiRetStatus = CyU3PDmaChannelCreate (&glChHandleBulkLpUtoP,
    		CY_U3P_DMA_TYPE_AUTO, &dmaCfg);
    if (apiRetStatus != CY_U3P_SUCCESS)
    {
        DBGPRINT ("glChHandleBulkLpUtoP create failed, Error code = %d\n", apiRetStatus);
        CyFxAppErrorHandler(apiRetStatus);
    }

    /* Create a DMA Manual(Auto) Channel between four sockets of the P port.
     * DMA size is set based on the USB speed. */
    dmaCfg.prodSckId = CY_FX_PRODUCER_PPORT_SOCKET;
    dmaCfg.consSckId = CY_FX_CONSUMER_USB_SOCKET;
    dmaCfg.notification = 0;
    dmaCfg.cb = NULL;
    apiRetStatus = CyU3PDmaChannelCreate (&glChHandleBulkLpPtoU,
    		CY_U3P_DMA_TYPE_AUTO, &dmaCfg);
    if (apiRetStatus != CY_U3P_SUCCESS)
    {
         DBGPRINT ("glChHandleBulkLpPtoU create failed, Error code = %d\n", apiRetStatus);
         CyFxAppErrorHandler(apiRetStatus);
    }

    /* Create a DMA Manual Out Channel between CPU and Interrupt Socket */
    dmaCfg.size  = CY_FX_INTR_TX_SIZE;
    dmaCfg.count = CY_FX_INTR_BUF_COUNT;
    dmaCfg.prodSckId = CY_U3P_CPU_SOCKET_PROD;
    dmaCfg.consSckId = CY_FX_INTERRUPT_USB_SOCKET;
    dmaCfg.dmaMode = CY_U3P_DMA_MODE_BYTE;
    /* No callback is required. */
    dmaCfg.notification = 0;
    dmaCfg.cb = NULL;

    apiRetStatus = CyU3PDmaChannelCreate (&glChHandlePushToInt,
    		CY_U3P_DMA_TYPE_MANUAL_OUT, &dmaCfg);
    if (apiRetStatus != CY_U3P_SUCCESS)
    {
        DBGPRINT ("glChHandlePushToInt create failed, Error code = %d\n", apiRetStatus);
        CyFxAppErrorHandler(apiRetStatus);
    }

    /* Create a DMA MANUAL_IN channel from the Interrupt PPORT to the CPU */

    dmaCfg.prodSckId = CY_FX_INTERRUPT_PRODUCER_PPORT_SOCKET;
    dmaCfg.consSckId = CY_U3P_CPU_SOCKET_CONS;
    dmaCfg.dmaMode = CY_U3P_DMA_MODE_BYTE;
    /* Different Callback setup */
    dmaCfg.notification = CY_U3P_DMA_CB_PROD_EVENT;
    dmaCfg.cb = InterruptDmaPtoUCallback;

    apiRetStatus = CyU3PDmaChannelCreate (&glChHandleInterruptLpPtoU,
    		CY_U3P_DMA_TYPE_MANUAL_IN, &dmaCfg);
    if (apiRetStatus != CY_U3P_SUCCESS)
    {
        DBGPRINT ("glChHandleInterruptLpPtoU create failed, Error code = %d\n", apiRetStatus);
        CyFxAppErrorHandler(apiRetStatus);
    }

    /* Create a DMA MANUAL_OUT channel from the CPU to the Interrupt PPORT */
    dmaCfg.prodSckId = CY_U3P_CPU_SOCKET_PROD;
    dmaCfg.consSckId = CY_FX_INTERRUPT_CONSUMER_PPORT_SOCKET;
    /* No callback is required. */
    dmaCfg.notification = 0;
    dmaCfg.cb = NULL;
    apiRetStatus = CyU3PDmaChannelCreate (&glChHandleInterruptLpUtoP,
            CY_U3P_DMA_TYPE_MANUAL_OUT, &dmaCfg);
    if (apiRetStatus != CY_U3P_SUCCESS)
    {
        DBGPRINT ("glChHandleInterruptLpUtoP create failed, Error code = %d\n", apiRetStatus);
        CyFxAppErrorHandler(apiRetStatus);
    }

 

 

Rereading AN65974, I was able to create a Flag E with the following settings:

dmickels_0-1637187610887.png

Which did reduce the amount of padding, but now I am getting some weird results that still seems to be an issue involving the padding. I'm able to confirm that it is just excess padding using the same code from before to strip the excess trailing zeros.

i = input->buffer_p.count - 1;
while(input->buffer_p.buffer[i] == 0x00){
    i--;
}
//input->buffer_p.count always returns 4096 due to GPIF padding. Find actual size, with benefit to full buffers.
CyU3PDmaChannelCommitBuffer (chHandle, i + 1, 0);

 

Using Wireshark to look at the packets, I can see that the zero padding still occurs in the trailing packet of my initial message (These packets are all the same, basically just sending 128000 bytes with the value 0x80, followed by 0x0A to signal the end of the data that was sent):

dmickels_1-1637187707926.png

Where the data sent should have stopped at the 0x0a value. This then causes the 0's to overflow in to the next packets buffer from the looks of things, as sending the same packet again starts with an excess of zeros before seeing my data:

dmickels_2-1637187795213.png

Which causes the end of that packets contents, to somehow be shifted into the next packets buffer, causing the end of my second packet to take up the beginning of my third packet, continuing this shifting pattern throughout every subsequent packet sent.

dmickels_3-1637187885766.png

I should also mention that this happens on what appears to be 1024 byte alignments, as the first packet is padded out to be 1024 bytes long, and the zero padding that is shifting my actual data around all seem to be 1024 bytes long as well.

Additionally, when I said that !ADDR_CNT_HIT&!FLAG_A gave a result that almost worked, this is the same result that I got.

I have tried adjusting my DATA_COUNT and ADDR_COUNT values to match the formulas as described in AN65974, but it doesn't actually seem to have any effect, the result is always the above.

I have reattached the updated state machines with the correct settings for Flag E.

Thank you for working through this with me.

 

0 Likes
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

Thank you for the details. 

I understand that currently only two channels are used for the test i.e. related to BULK endpoints and the channels related to  interrupt endpoint are not used during this test. Is my understanding correct?

If yes, I understand that following is the test process you are following

- Connecting two FX3 back to back

- Sending 84 bytes from Master > Slave  works as expected

- Sending 84 bytes from Slave > Master doesn't work as expected (i.e. zeroes are padded)

From this, it seems that Read from Master doesn't work as expected. I understand that you have configured the watermark value as 0 using CyU3PGpifSocketConfigure api.

Which did reduce the amount of padding, but now I am getting some weird results that still seems to be an issue involving the padding.

>> To narrow down the problem, please do the following and let me know the results (control center snippets)

- Send small amount of data (for example 16 bytes - incrementing numbers) from Master to Slave

- Read the data from IN endpoint of Slave

- Send small amount of data (for example 16 bytes - incrementing numbers) from Slave to Master. Check the input->buffer_p.count in DMA callback of P to U channel on Slave side. Please do not call cyu3pdebugprint inside the DMA callback. I

- Check if CyFxApplnGPIFEventCB is called on master. When CyU3PDmaChannelSetWrapUp is called a producer event for glChHandleBulkLpPtoU channel will be triggered. Copy the  input->buffer_p.count into a variable and print that variable outside the DMA callback.

From this test we can understand from which point are the zeroes added. Also, please remove the code snippet to remove the zeros for this test.

Also, I had a query regarding the transition equation from state SELECT_1_OR_2, why one transition uses logic 1 and other one checks for Flag

Regards,
Rashi
0 Likes

It looks like my suspicion of it being 1024 byte aligned was correct. I increased the data size, and all of the way up to 1024 bytes, I receive the exact amount of data that I sent. As an example, I sent 996 bytes:

dmickels_0-1637277659000.png

And read 996 bytes on the other side:

dmickels_1-1637277668192.png

However, something interesting happens exactly at 1024 bytes. When transmitting 1024 bytes, it appears to transfer successfully, but then the read fails on the other end as if no data was made available.

dmickels_2-1637277687554.png

Then, anything greater in size then 1024 bytes becomes fully padded with zeros. As an example, the picture below shows a transfer of 1040 bytes:

dmickels_3-1637277700720.png

But when read on the other end, its fully padded out with zeros.

dmickels_4-1637277707685.png

When reading the size of input->buffer_p.count, I get the exact value of data sent when less than 1024 bytes, and 32768 when greater than 1024 bytes. However, at 1024 exactly, I am in fact not receiving the packet at all.

For additional information, I have my sockets configured as follows.

Master side:

 

CyU3PGpifSocketConfigure(0, CY_U3P_PIB_SOCKET_0, 4, CyFalse, 7);

CyU3PGpifSocketConfigure(1, CY_U3P_PIB_SOCKET_1, 4, CyFalse, 7);

CyU3PGpifSocketConfigure(2, CY_U3P_PIB_SOCKET_2, 4, CyFalse, 7);

CyU3PGpifSocketConfigure(3, CY_U3P_PIB_SOCKET_3, 4, CyFalse, 7);

 

Slave side:

 

CyU3PGpifSocketConfigure(0, CY_U3P_PIB_SOCKET_0, 4, CyFalse, 7);

CyU3PGpifSocketConfigure(1, CY_U3P_PIB_SOCKET_1, 0, CyFalse, 1);

CyU3PGpifSocketConfigure(2, CY_U3P_PIB_SOCKET_2, 4, CyFalse, 7);

CyU3PGpifSocketConfigure(3, CY_U3P_PIB_SOCKET_3, 0, CyFalse, 1);

 

I have tried a variety of combinations in addition to this, including adjusting the watermark from 0-4 all all sockets, as well as adjusting the burst values between 0-7. Additionally, Master side is running with a pibClock.clkDiv = 4, and Slave side is running with it set to 2 as suggested in the example application.

 

EDIT:
By adjusting my LD_DATA_COUNT and LD_ADDR_COUNT to both be 16383, I was able to remove the zero padding for all values both less than 1024 and greater than 1024, but I still don't see packets that are exactly 1024 in length coming through. The only thing that I can think of/find that is 1024 is the bulk endpoints themselves, but I am not sure if this would have any affect on GPIF?

 

Best,

Devon

0 Likes
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

Glad to hear that the zero padding problem is resolved when the data counters at both master and slave side are same.

For the 1024 bytes issue, please refer to this KBA https://community.infineon.com/t5/Knowledge-Base-Articles/Data-sent-from-Host-over-USB-is-not-Commit...

Please let me know if any query on this

Regards,
Rashi
0 Likes

Hi,

Thanks for linking that article, it seems to be exactly the issue, with all values where (X % 1024) == 0. Is there not a way to force a ZLP from the GPIF state machine so that I do not have to modify the drivers I am using on either side


I would think there is a solution where the DMA callback or the GPIF state machine is able to recognize that data was sent, at which point it can handle sending a ZLP.

0 Likes
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

Please see my comments below:

Is there not a way to force a ZLP from the GPIF state machine so that I do not have to modify the drivers I am using on either side

>> As the DMA channel UIB to PIB the ZLP should come from USB side and not from PIB/GPIF. The data is to be committed from USB side and not from GPIF side. 

The host application need to be modified to send the ZLP if  (X % 1024) == 0

Regards,
Rashi
0 Likes