Stale TCP ACK Problem in Netx Duo

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Level 2
Level 2
First like received

Intermittently, all of our applications experience premature failure of TCP connections. We noticed that the likelihood of failure grows with bidirectional data transfer and amount of data transfer. After capturing many traces of this phenomena, we have reduced it to a sequence involving a data packet loss in one direction followed by a data packet in the reverse direction.

See the attached trace file for Frame references.

The situation around this problem always takes the following form:

1) A data packet from NetX to Windows TCP is lost. [Between Frame 4894 and Frame 4895, packet Seq 279060 is lost.]

2) Meanwhile Windows sends a data packet to NetX. [Frame 4898 and then Frame 4900.]

3) NetX ACKs the Windows packet while sending more data, not yet aware of the loss. [Frame 4901.]

4) NetX later retransmits the lost data packet. [Frame 4903.] This packet has a fresh IP Identification number but a stale Acknowledgement number (the Acknowledgement number is the same as when the original data packet was transmitted, which is now larger because at least one Microsoft data packet has subsequently been acknowledged).

5) Microsoft ignores the NetX retransmitted data packet.

6) (Optional)If NetX does not see a probe or other packet from Windows, it broadcasts an ARP request. [Frame 4904.] In these cases it goes on to the next step as usual. [Frame 4606.]

7) (Optional) Microsoft sometimes sends a zero window probe. [Frame 4907.] In these cases NetX is properly acknowledging the probe with fresh information. [Frame 4908.]

😎 NetX times out and retransmits data again [Frame 4909.] (always with the stale Acknowledgement number [Ack=278529 verus Ack=280577 in Frame 4907]) until it times out the eighth time. [Frame 4921.]

9) NetX TCP send returns

2 Replies
Not applicable
Thank you for the information and detailed analysis.

We will investigate the NetX internals to confirm if your theory is correct.
Not applicable

We are also seeing intermittent tcp disconnects.  Have you found a solution? Does using one of the other RTOS's help?