Getting Spurious Zero Length Packets

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
ThAl_4704151
Level 4
Level 4
25 sign-ins 25 replies posted 10 replies posted

Hello all, 

Most of the data I'm trying to transmit seems to be going between the FX3 and the other chip just fine, but I'm periodically getting zero length packets seemingly coming from nowhere. I've been trying to debug this, but everything I've tried has been a dead end so far.

I'm trying to use the FX3 to connect an 8051 on our board with a host application on a windows PC. My setup is as follows:

  • An 8051 that has the microcontroller business logic. 
    • It receives input from the host from GPIF-II addresses 1 (buffered) and 2 (immediate), sends responses on 3, and there's logging macros that use address 4.
    • It's mostly legacy code, except what I've changed to make it work with the FX3.
  • The FX3 that's mainly passing data between the host and the 8051
    • I'm using the GPIF-II interface. The state machine I'm using is basically the async slave fifo interface provided as a sample, except that I've changed some of the pin numbers.
    • USB addresses 2, 4, 6, and 8 are connected to GPIF-II addresses 1, 2, 3, and 4 respectively.
    • I have an additional USB endpoint 10 that I'm using for debugging. Every time data a packet is added to one of the other buffers, I sent the contents and a little metadata to the host. This way, I can debug problems between the FX3 and the 8051 independent of the production app running on the host.
  • On the Windows Host PC, I have the production app that's trying to use endpoints 2, 4, 6, and 8 to talk to the 8051. There's some other issues there that I still need to investigate, but it's at least able to consume data from the buffer so they never get too full.
  • Also on the windows PC, there's a logger app that connects to endpoint 10, and formats and prints the packets that the FX3 has received from the 8051.

Now the problem I'm seeing is that, the 8051 sends a keepalive signal periodically to the host app, and every once in a while, we get a zero length packet as well.

Here's what I've tried in my investigation:

  • I've set breakpoints in the 8051 code to check if the actual data we tell it to send really is zero, and it never gets triggered, so I can be sure that we don't ever try to send a ZLP. I've verified that the pins I'm using to communicate with the 8051 are only touched in the 4 read and write functions that I expect them to, so the breakpoints I mentioned should cover every situation.
  • I've verified that the order in which the pins change to control the state machine is correct. That is to say to write to the buffer:
    1. SLWR = 0
    2. Write Data
    3. if this is not the last byte, SLWR=1 and go back to 1.
    4. If this is the last byte:
    5. PKTEND = 0
    6. SLWR = 1
    7. PKTEND = 1
  • My coworker who designed the board suggested that timing issues could mean the state machine is in a different state than I thought it was, so I've done the following to check timing:
    • In the FX3 manual, it says the GPIF-II "Enables interface frequencies up to 100 MHz", which I assume means that it can change state at least once every 10ns. In the state machine, changing a single pin causes at most 3 state changes before it reaches a stable state, so I tried adding a 30ns delay after every time we change a pin. Then when that didn't make a difference, I added another 30ns just in case, but still no dice.
    • Then I tried adding a full 1ms delay for good measure after every time I changed a pin, which changed nothing either. I figured this would cover my bases if I was mistaken about how fast the state machine transitions.

I suspected that maybe there's noise or something changing the PKTEND pin long enough to trick the state machine into sending a ZLP, but my coworker has already dismissed that notion. He's the hardware guy, so I can't really make him check it, and he didn't leave a way to attach a scope or anything to the pin, so wouldn't be easy.

If anyone has any suggestions of what might be going wrong, or ideas for other things I could test, I'd love to hear them.

0 Likes
1 Solution

Hello,

Apologies for late response.

Just to evaluate, CyU3PDmaChannelGetBuffer() should be not blocking because I gave it the NO_WAIT option. You said memcpy is blocking. Is CyU3PDmaChannelCommitBuffer() blocking? 

>> CyU3PDmaChannelCommitBuffer is non blocking. memcpy is blocking when large amount data is copied and rate at which data is sent to FX3 via GPIF is high.

Is there a document that specifies which functions are blocking and which aren't, or am I just meant to guess?

>> Please refer to FX3 API Guide for the information.

 I've registered a pib callback for errors and set a breakpoint in it, but it never trips, so it's not reporting errors. I was still able to see an empty buffer in the callback.

>>  Can you confirm this using UART debug prints as well (not using JTAG debugging).

If there are no PIB errors after that and unexpected ZLP are seen, this can be due to some issue on GPIF interface. May be unstable clock or issue in the timing signals.

Is it possible to test AN65974 firmware and FPGA code with your setup just to make sure that FX3 is not causing the issue. Also, I noticed that state machine used in your application is similar to AN65974.

One more test that can be done is disabling all the code from DMA callback and just checking the data in DMA Buffer. If possible, we can just enable the DMA channel that is causing the issue currently (disabling all the other DMA channels)

Regards,
Rashi

View solution in original post

0 Likes
19 Replies
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello,

Please confirm if COMMIT action is not called alone (without IN DATA) in any state of the GPIF state machine.

Please refer to errata 3 of the FX3 datasheet.

Also, please let me know if the ZLP is seen when PKTEND is sent or is it sent irregularly. Please probe the signals on the GPIF interface when ZLP is generated

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

Thanks for responding so quickly. Commit is used in two spots in the state machine. One is alone in the ZLP state in case I really do want a ZLP, but the other is used with IN_DATA in the SHORT_PKT state.

I don't have access to the pins the GPIF interface is using. This is on our prototype board, and I don't have anywhere to attach probes. 

I can say that the ZLP doesn't happen every time. Messages seem to come about every 500ms, which is how often the 8051 sends a keepalive. Sometimes we get a ZLP instead of a keepalive. I'll export the log and attach it so you can see what I'm seeing.

0 Likes

Hello,

Please let me know if you are using DMA channels in MANUAL mode. If yes, please check if any DMA buffer with count =0 is received when the producer event is triggered.

Also, if possible please share the Wireshark (.pcap) debug traces for us to check

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

Hi,

I was using auto with signaling for endpoints 2, 4, 6, and 8. I use the callback from the signal to report everything over my debug endpoint, which is manual.

It took a little work to figure out wireshark, but I was able to find something interesting. I looked up the keepalive packets immediately before and after one of the zero length packets as reported by my logger program, and found their operation numbers were 96 and 98, so presumably the number of the zero length packet in my logger was supposed to be 97.

So then I looked up 97 in wireshark, and sure enough, there it was, and all the data was there as expected. So then I looked up the corresponding packets that were sent through the debug endpoint, and I found that 96 contained all the expected data and metadata, and so did 98. However, even though real packet 97 was just fine, debug packet 97 had no actual data, just the flags and such that I add.

The function that creates debug packets is pretty simple, but the data it reports on comes from the callback that responds when data is added on a producer. The fact that the real packet goes through as expected tells me that I'm writing it correctly, and my flags and checksums tell me that the data I add for the debug packets is correct. The thing that's incorrect is the buffer in the CyU3PDmaCBInput_t object provided by the callback.

I'm attaching the pcap file so you can see. Real packets 96, 97, and 98 are number 1374, 1502, 1600 respectively. Debug packets 96, 97, and 98 are 1380, 1507, and 1603 respectively. To make it easier to read, the debug packets start with 0xdeadbeef and have a bunch of bytes for flags and such, then the data from the packet they're reporting on, and finally a checksum.

Anyway, I can use wireshark for my debugging now, so it's not as big a problem anymore, but I'm a little curious why the DMA callback was giving me bad data when I was trying to use that.

0 Likes

Hello,

From the Wireshark, traces we can see the ZLP because of the USBDSTATUS is Cancelled (USBD_STATUS_CANCELLED)

This is as the host is sending the abort pipe/reset pipe from the host. Please check why is the host sending the abort pipe right after device enumeration. Can you please try removing/disabling debug channel? I understand endpoint 10 is for debugging channel.

Please let me know why is the 

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

Hi Rashi,

Endpoint 10 is just for a logging tool that I wrote to try to debug some other issues. I connect to it with a logging app I wrote on the host PC. It doesn't explicitly send aborts or try to close connections unless we receive the event that it's been disconnected, which didn't happen while I was recording. If it's sending aborts, maybe it's doing it behind the scenes somewhere.

I have however had some issues in the past with the xferdata function in CyAPI locking up indefinitely if a device disconnects while we're listening for data. The guys here on the forum told me to call abort on the thread calling the function when that happens, which is normally a really bad idea. In fact it's such a bad idea that Microsoft doesn't allow you to do it with modern versions of C#, but it's just a debugging tool and I didn't have any better options, so I found an older version of C# that does allow it.

The other endpoints are serviced by the actual app I'm trying to work with, which is a legacy monstrosity that I don't fully understand. It looks like it only sends an abort right before closing the connection. It also doesn't explicitly try to do anything with endpoint 10 because it didn't exist when the program was written. I added it recently as a debugging tool.

I did build the FX3 firmware with a macro that allows me to easily toggle endpoint 10, so I've used that to remove the debug endpoint, and I'm attaching the log for your perusal. Again, our device is 2.6, so we see the periodic keepalive messages we expect from 2.6.6. No cancels though.

Regards,
Thomas

0 Likes

Hello Thomas,

Understood.

From the Wireshark traces, I could see that the transfers are done on endpoint 0x86 (Interrupt IN) and I didn't see any ZLPs in the traces. Please let me know at which packet is the issue seen.

Regards,
Rashi
0 Likes

I think we've determined pretty conclusively that the ZLPs were an artifact of the method I was using to log them. Namely, I was using the DMA callback when data is added to a DMA buffer entry, which receives an object with a pointer to the data buffer. I was using endpoint 10 to send that data to a logger app I wrote on the host PC. We can see from the wireshark traces that the real data is going through just fine, as are the debug packets.

Occasionally though, the buffer in the DMA callback is empty when it should contain the data to be sent to the host. I have the firmware sending that over endpoint 10, and the logger app interprets that information as a ZLP, since there was zero data in the buffer. The logging packets are the ones that start with 0xdeadbeef, so you can see the normal ones where there's a few flags that I set, a bunch of zeros for bytes I haven't used yet, and a copy of the data from the previous real packet. Some of them are missing the data from the previous real packet, which my logger app was interpreting as a ZLP.

So we know where they're coming from, but I don't know why the buffer passed to the callback is occasionally empty when there's real data it should be passing in instead.

In the last set of wireshark traces, you asked me to disable endpoint 10, so my logger app received nothing. If you want to see the deadbeef packets, they're in the first set of wireshark traces I sent you.

0 Likes

Hello,

Apologies for the delayed response.

From your response, I understand that you are seeing empty DMA buffers in DMA Callback itself i.e. from GPIF you are getting ZLP. Is my understanding correct?

If yes, can you please share your GPIF state machine for us to debug

Also, please check if the PCLK from the master is stable when the issue is seen

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

In the callback, I'm getting empty DMA buffers, but the Windows host machine I'm trying to send them to is receiving the data regardless. I've attached the state machine project.

PCLK is a pin on the chip, right? The guy who made the board for me didn't include a way for me to probe the pins without breaking the board.

0 Likes

Hello,

From the state machine, only state ZLP can send DMA buffer with count =0. Please confirm if the FPGA/ master is proper. If PKTEND is asserted from the master then the ZLP will be seen in the firmware (DMA callback).

Please confirm that CyU3PDmaSocketSetWrapUp is not called in the firmware.

Regards,
Rashi
0 Likes

Hi Rashi,

I have confirmed that the only time the master asserts packetend is from WRITE_START to go to SHORT_PKT. 

I found no call to CyU3PDmaSocketSetWrapUp.

I don't think it's a problem on the master because from the wireshark traces, we can see the data going through correctly over endpoint 6. It's just that the logging data going through endpoint 10 is incorrect. When I set breakpoints in the callback, I found that it really is giving me empty buffers, which is why the logging I do there reports ZLPs.

What I don't understand is why the callback is (occasionally) giving me empty buffers when the correct data is going through endpoint 6.

0 Likes

Hello,

It is strange that only the DMA buffer is empty when DMA callback is triggered but there is data in the buffer when it is sent to USB.

Is it possible for you to share the firmware with us for us to check?

Also, can you confirm that Empty DMA buffers are seen when data logging is disabled on endpoint 10

Regards,
Rashi
0 Likes
lock attach
Attachments are accessible only for community members.

I've disabled endpoint 10 and set a breakpoint in the callback before the disabled logging code, and I was still able to catch an empty buffer.

I've zipped and attached the workspace for the FX3 firmware.

 

0 Likes

Hello,

It seems that the issue could be caused due to debugLog API.

We recommend not to call blocking APIs inside DMA callback. debugLog is using memcpy which copies large data into another buffer in the DMA callback itself. Please try commenting out debugLog from the DMA callback and use os events for tracking purpose.

Also, please register for pib/gpif errors using CyU3PPibRegisterCallback and let me know if any error is triggered.

Regards,
Rashi
0 Likes

I can try moving the actual writing part to the main loop, but I used my macro to gate endpoint 10, and code in debug log is entirely gated by that macro as well. When endpoint 10 is disabled, debugLog is a noop, but I was still able to catch an empty buffer in the callback under those conditions.

Just to evaluate, CyU3PDmaChannelGetBuffer() should be not blocking because I gave it the NO_WAIT option. You said memcpy is blocking. Is CyU3PDmaChannelCommitBuffer() blocking? Is there a document that specifies which functions are blocking and which aren't, or am I just meant to guess? I don't see a mention of it in the comments in the header file.

I'll try registering for errors and update this comment with the results.

Edit: I've registered a pib callback for errors and set a breakpoint in it, but it never trips, so it's not reporting errors. I was still able to see an empty buffer in the callback. The debugLog function was disabled by commenting out the DEBUG_ENDPOINT10 macro definition.

0 Likes

Hello,

Apologies for late response.

Just to evaluate, CyU3PDmaChannelGetBuffer() should be not blocking because I gave it the NO_WAIT option. You said memcpy is blocking. Is CyU3PDmaChannelCommitBuffer() blocking? 

>> CyU3PDmaChannelCommitBuffer is non blocking. memcpy is blocking when large amount data is copied and rate at which data is sent to FX3 via GPIF is high.

Is there a document that specifies which functions are blocking and which aren't, or am I just meant to guess?

>> Please refer to FX3 API Guide for the information.

 I've registered a pib callback for errors and set a breakpoint in it, but it never trips, so it's not reporting errors. I was still able to see an empty buffer in the callback.

>>  Can you confirm this using UART debug prints as well (not using JTAG debugging).

If there are no PIB errors after that and unexpected ZLP are seen, this can be due to some issue on GPIF interface. May be unstable clock or issue in the timing signals.

Is it possible to test AN65974 firmware and FPGA code with your setup just to make sure that FX3 is not causing the issue. Also, I noticed that state machine used in your application is similar to AN65974.

One more test that can be done is disabling all the code from DMA callback and just checking the data in DMA Buffer. If possible, we can just enable the DMA channel that is causing the issue currently (disabling all the other DMA channels)

Regards,
Rashi
0 Likes

The EE who designed my board didn't give me access to the UART pins, so I can't connect to that.

The state machine I'm using is just the async slave fifo example project, except with some of the pins changed. AN65974 is a synchronous project though, right? It seems like I would need to change the code on the microcontroller it's connected to quite a bit, even if the EE connected all the right pins, which I don't believe he did. That was why I needed to modify the sample project in the first place.

0 Likes

Hello,

 AN65974 is a synchronous project though, right?

>> Yes

 It seems like I would need to change the code on the microcontroller it's connected to quite a bit, even if the EE connected all the right pins, which I don't believe he did. That was why I needed to modify the sample project in the first place.

>>Please let me know if there are any updates here

Regards,
Rashi
0 Likes