I am working on a design with the synchronous Slave FIFO interface with the FX3 SuperSpeed Explorer Kit. However, I have problems with the latency issues.
My application sends 36-byte instructions to a 2048-byte FIFO on an FPGA; the Slave FIFO interface reads from the FX3 at 100 MHz and writes the data to the FIFO, but the FPGA reads one instruction at every clock cycle of a 0.5 MHz FPGA clock. After some processing, the FPGA sends results back with a "timestamp" of sorts based on the 0.5 MHz clock on my FPGA, so I can see when instructions are received by the FPGA. I am using the Slave FIFO interface to communicate with my host computer, which is a Wandboard Quad-Core single board computer runnign Ubuntu 14.04 with USB 2.0. connection.
I'm trying to reduce the latency of sending instructions from my host computer to the FX3 because the FPGA isn't receiving data fast enough from the FX3.. On my FX3 implementation, I have 45 2048-byte DMA buffers to be read from by the FPGA. Because of the size of instructions, I'm sending chunks of instructions together at the same time, filling up a buffer with 2016 bytes of data. However, when testing, I find that there is significant delay by 100 milliseconds when switching between DMA buffers. From the time of my last instruction at the end of one DMA buffer to the time of the first instruction at the beginning of the next DMA buffer, based on results, even though they were spaced 7 clock cycles apart, the results show that they were received 1000 cycles apart.
I'm trying to reduce the delay down to microseconds. I tried switching to a DMA AUTO configuration, changing the endpoints to be INTERRUPT endpoints rather than BULK endpoints, but I can't seem to get the latency down. I'm using LIBUSB to send data from my Wandboard using ASYNCHRONOUS data transfers, but what else can I do to reduce the latency? Am I just running into the limitations of USB 2.0 transfer? I can't increase the burst Transfer to more than 1, but what else can do to reduce the latency?
The buffer switching delay should only be around max. 5 uS. How exactly did you measure it to be 1000 cycles? Did you notice the delay in the flags state change?
- Madhu Sudhan
Apologies for the late response. I put signal probes on my FPGA, and I measured the time difference between the end of one transfer and the start of another transfer based on when the dedicated flag goes high for the DMA channel that is being read from by the FX3. I took the difference between the end of the one transfer and the start of the next, and I measured the latency based on the sampling clock. I was performing separate LIBUSB transfers for each packet, so one LIBUSB transfer held 36 bytes while the next one held 36 bytes.