SuperSpeed Explorer Kit using Streamer Application with add-in USB3 PCIe card

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
frank9876
Level 2
Level 2
10 replies posted 10 sign-ins 5 replies posted

Hello.

I am using the Streamer Application  from SDK1.3.4 with the SuperSpeed Explorer Kit.  When I plug the setup into a USB3 port that is part of the motherboard, all is fine and I can get transfers approximately 4.3Gbps.  There are no failures.  I am just using the default settings that the Streamer Application starts with.  All looks good and is okay.

However, when I try to use a RocketU 1244A USB3 add-in card with the default settings,  I get approximately 200 Successes and then it just keeps failing.  The transfer rate just decreases to 0 if I let it run over time.  After the initial successes, everything is failure.

The settings are BULK IN, packets per xfer=32, xfers to queue=16.  Which is the default.

After trying numerous different things, it appears that the 1244A card cannot queue up any more than one DeviceIoControl() buffer at a time?  Other cards, in addition to the motherboard usb ports, appear to allow queuing up multiple buffers simultaneously via DeviceIoControl().

If I make the packets per xfer=1 and xfers to queue=1 in the Streamer Application with the 1244A card, I can get no failures but the transfer rate is 0.043Gbps (much slower).  Are you aware of any limitations in using DeviceIoControl() with the 1244A USB3 PCIe card?

Frank

0 Likes
1 Solution
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello Frank,

Please refer to a similar thread  Solved: Bulk In data streaming hangs on CYUSB3KIT-001 DevK... - Infineon Developer Community  

The issue doesn't seem to be caused by the cyusb3 driver

Regards,
Rashi

View solution in original post

0 Likes
22 Replies
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello Frank,

As you already mentioned, the problem seems to be with the 1244A USB3 PCIe card. 

I understand that you have tried with different cards (other than 1244A USB3 PCIe) with which the data transfers are as expected, can you confirm if the host controller for those is are same as the one used 1244A USB3 PCIe. 

Also, if possible can you please try with a different 1244A USB3 PCIe card (new card) just to confirm that the problem is not with that particular unit.

Regards,
Rashi
0 Likes

Hi Rashi,

I've used the StarTech USB3 card with UASP (PEXUSB3525).  When I use that PCIe plug-in card, I get about 374000KBps.  With a host controller on the motherboard (Intel) I am getting about 440000KBps.  So both are fast with the Streamer application and show no failures.  The StarTech card uses a host controller from Renesas. 

The 1244A card uses an ASMedia host controller.  I just bought the 1244A card so it is brand new.  I do not have more than one 1244A card.  The 1244A card is rather expensive for me so I really cannot get another card to try.

As far as I can tell, the card works on USB2 devices and mostly works on USB3 provided you only buffer one packet at a time.  It appears that the only issue is that I cannot transfer data fast which was the reason I bought the card (four independent fast channels).  I do not know if this card is supposed to theoretically work (although my assumption is that it should at USB3 speeds) and I have a bad card or this card just doesn't work fast because of other known issues.  I've tried the card in several PC's and get the same issue (slow transfers).

In another application I have, I use the CYUSB3 driver and use DeviceIoControl() to queue up multiple buffers and then wait for them to arrive.  By using multiple buffers, the transfers are fast and that application works with the Intel and Renesas controllers.  In that application, I've found that the 1244A card only works when I queue up one buffer.  When I queue up multiple buffers, I get data for the first buffer but then the other buffers time out because they get no data.  Whereas with all the other USB cards, the multiple buffers work fine.  I haven't traced through the Streamer Application but I suspect the same thing is happening in the Streamer application because when I set the parameters to 1, 1, the 1244A works with no failures (albeit slow).

The problem I have is that I don't have a fast USB3 device that does not use the CYUSB3 driver.  So I cannot determine if the card works with a different USB3 device at fast speeds. Because of that I do not know if the problem is 1) bad card or 2) limitation of CYUSB3 driver or 3) limitation of ASMedia controller or something else.  I was hoping to find out if the Streamer Application at the default settings work without failures with the 1244A card because then I could determine it was a bad card.  If the Streamer Application does not work at the default settings, then I know it's not a 1244A card problem.  I just do not know if this is a known issue with CYUSB3.sys or I am doing something wrong with the PC settings or something else.

I will need to research how to find a USB3 device and perform a speed test with the 1244A card.  That may help me isolate whether it is a 1244A card problem.

Thanks for the help,

Frank

0 Likes

Hello Frank,

Thank you for the details.

You can refer to this app note https://www.cypress.com/file/125281/download to understand  the USB 3.0 throughput with FX3 and different host controllers.

From the details, I understand that USB transfers are successful with streamer (default settings) using other cards but not successful with the 1244A. In that case, as you mentioned checking with some other device will help to narrow down the problem.

Please let me know if you need more help

Regards,
Rashi
0 Likes

Hi Rashi,

Your understanding is correct with the 1244A card.

I've read that application note.  I suspect that the tests were run with host controllers confined to motherboards.  The heading on the table contains (built-in) which leads me to that conclusion.  There is no mention of tests with add-in usb cards.  It would be beneficial if there was an application note with the exact same tests with different usb3 add-in cards (with one being the 1244A).

I will continue to experiment over time and try to isolate what is happening.

Thanks,

Frank

0 Likes

Hi Frank,

Unfortunately, we do not have app note with the tests with different usb3 add-in cards (with one being the 1244A).

I will continue to experiment over time and try to isolate what is happening.

>> Can you please let me know if there are any updates here?

Regards,
Rashi
0 Likes

Hi Rashi,

My plan is to try to get an FX3 project that can work without the CYUSB3 driver and test the transfer speeds.  However, this is now a background task and I cannot spend full-time on this task at this time.  From the testing I've done, I suspect that I will get fast speeds without using the CYUSB3 driver but I need to test that hypothesis for verification.  I'm hoping I  can create a test project by the end of the year with a FX3 (things may slow down mid-December for me where I can spend some more time on this).   I will certainly post the results after I can gather more information.

Frank

0 Likes

Hi Rashi,

I ran a few more experiments.  I haven't reached a conclusion yet but it is more details.

I realized that the FX3 SDK can also be run on Linux.  So, I downloaded SDK FX3 1.3.4 and spent time getting it to work on Linux because it doesn't run just out of the box.  After getting all the way to the Linux Control Center and clicking on "Streamer" it didn't work.  Found out that Streamer does not work on Linux via this link.

https://community.infineon.com/t5/USB-Superspeed-Peripherals/FX3-Streamer-application-for-Linux/td-p...

I tried the bulkloop test on Linux and that could transfer files but gave no speed data.

I then tried my other application that does not work on Windows with the 1244A card and found that it also doesn't work on Linux with the 1244A card.

Then I tracked down a hard drive disk dock that I found uses USB3.  I put in an old HDD drive and was getting about 190MB/s with all three different usb ports (motherboard, 1244A card, and startech card).  Upgraded to a SSD drive and I could get about 400MB/s with the 1244A card.  All three usb ports were pretty close in speed.  I ran the tests on both Windows and Linux.  My conclusion  after these tests was that it sure seems like the 1244A card is working.  Because 400MB/s is 3.2Gbps and that's well over what I could get with the Streamer Application on Win 10.  The difference now being mass storage enumeration.

Then I noticed that the star tech card has UASP in the name.  After doing some searching, here is what I am wondering.  Is it possible that the SuperSpeed Developer Kit (either FX3 enumeration or CYUSB3.sys) does not force UASP when the device is connected to some usb add-in cards?  Maybe some usb add-in cards use UASP as default and others do not.  Is there a way to force UASP with a setting to determine if this is the issue with the 1244A card and the Streamer Application?  Is there a way to make the Streamer Application run with the FX3 as a mass storage device so that the CYUSB3 driver is not running for a speed comparison?

Frank

0 Likes

Hi Rashi,

I spent some more time investigating this issue.  After looking at the supplied MSC project, it works in BOT only.  I know my hard drive transfers are fast and they use UASP (I examined the usb descriptor interfaces).  I figured that if I modified the MSC project for UASP, I'd end up not gaining any useful information.  I should get fast transfer speeds without UASP.  I then started experimenting with other devices using different drivers and other superspeed projects but didn't find any useful information other than to convince myself that the RocketU card appears to be working.

By random chance, I happened to stumble upon a fix for the streamer application.  The streamer application does not process any received data resulting in fast buffer turnaround times.  There are two loops in the streamer application XferLoop: 1) queue all buffers and 2) receive buffer and re-queue.  Both loops are fast.  I found that if I slowed down both loops with a delay,  the streamer application now works with the RocketU card.  I used - for (volatile int zzz = 0; zzz < 100000; zzz++); - in both loops.  I did not investigate for an optimal delay.  Depending on the packet transfer size chosen, this delay can be masked out and fast transfers result (>400000KB/s).  Apparently, there must not be any feedback at a low level indicating adding another buffer to the queue is acceptable.  I suspect that any application using the CYUSB3 driver that rapidly queues multiple buffers will have trouble with this RocketU card.

Although this fixes the streamer application, I suspect that there are other subtle issues with the queue buffering with the RocketU card.  I have another application which I can now get to work with >1 transfers to queue (with the added delay above).  However, this other application does not work all the time.  I have not been able to figure out what exactly is wrong.  It appears as if there is another subtle dependency when queuing buffers on whether data is immediately available or not.  If the endpoint is primed and ready when the buffers are queued, the application pretty much works with any number of queued buffers.  However, if the endpoint is not ready with data, the application only works if the number of queued buffers is less than or equal to 14.  I was thinking that in the case when data is ready, the number of outstanding buffers is most likely always smaller than 14.  However, I cannot determine any root cause that explains the behavior I see.  My current thinking is that there is something being overlooked at a low level.  I'm hoping that I may stumble onto some kind of workaround for this issue as well.

I'm going to take another break from working on this.

Frank

0 Likes

Hello Frank,

Thank you for the update.

Glad to hear that the problem with Streamer application with RocketU card was resolved by adding delay to the loops in streamer application.

For the other issue with custom host application, please let me know if USB requests are sent to device over the bus. The reason I am asking this is the endpoint not being ready could be the case with streamer application also, but it works as expected after adding delay. 

Please let us know the functionality of the custom host application so that we can help you with this issue.

Regards,
Rashi
0 Likes

Hi Rashi,

Thank you for your reply.

My application is slightly more complicated than the streamer application.  My application uses two endpoints that need to work simultaneously (one endpoint streams at the fastest GPIF rate and the other handles asynchronous communications).  The first part of the problem was just getting one endpoint to work and that's how I found  that the streamer application was not working (similar to my application) and it was easy to duplicate in streamer.  That led me to the delay fixes above which work  in both the streamer application and my application.

In my application, there appears to be two further ongoing issues (that I am classifying as two issues but am not entirely sure).  First, if I queue  more than 14 transfers before data starts streaming, it does not work.  Second, if the other endpoint starts operating while the first endpoint is streaming data, the data stops streaming for some reason.  As best as I can tell, they are related to queueing buffers as well.

For the first issue, I was trying to duplicate the issue in streamer.  I modified the FX3 project that streamer uses to not immediately send data (it would only start after changing the LED blink rate).  In this case, I can start streamer first and then verify that data streams after changing the LED blink rate.  That appears to work with my limited tests.  Although I have yet to explore the differences  in the API calls between streamer and my application.  In my mind,  I should be able to duplicate the issue using streamer or at least narrow down why one works and the other doesn't.

For the second issue, I'm hoping that fixing the first issue fixes this second issue.  I was starting to think that a DeviceIOControl() call into CYUSB3 is not thread-safe.  It is possible that two threads in my application simultaneously call into DeviceIOControl() for the two different endpoints.  I think I prevented that but it didn't seem to change anything.  What I am concerned about is that if the DeviceIOControl() call returns before the buffer is actually queued, a second call into DeviceIOControl() may mess something up with the first call.  There is much more to explore on this issue as well.

I do not have a usb analyzer.  I'm not sure if I can detect if an endpoint is ready using Wireshark since it is up a few layers from the hardware.  I've noticed that some USB commands that go to the device don't show up in Wireshark, so some analysis is limited using Wireshark.

I never realized when I purchased this usb card that it wouldn't work out of the box.  I'm still hopeful I can make it work with my product.

I will return to working on these issues in January.

Frank

0 Likes

Hello Frank,

As you are using both synchronous and asynchronous USB transfers, please refer to this link  Synchronization and Overlapped Input and Output - Win32 apps | Microsoft Docs which explains the synchronization between two types of transfers

You can then refer to the implementation in Streamer example (Path: \Cypress\EZ-USB FX3 SDK\1.3\application\c_sharp\streamer) which uses asynchronous transferring of data.

Regards,
Rashi
0 Likes

Hi Rashi,

Thank you for the link.  I am using all asynchronous USB transfers.  It is quite possible that I am doing some subtle asynchronous transfer not entirely correctly.  That is my current suspicion.  The exact same code works with the motherboard USB host connection (Intel) and a different USB card (just doesn't work with this 1244A USB card).  I am going to try to further narrow down the issue this week.

Frank

0 Likes

Hi Rashi,

I stumbled on a peculiarity.  I'm not sure what it means but it improves things slightly because now rather than completely hanging in my application I can get my application to limp along.

I found that with the 1244A card, the DeviceIOControl()/WaitForObject() call to retrieve a USB buffer times out on a buffer when there are more than 4 buffers.  It's peculiar because it'll receive a bunch of buffers and then all of a sudden a buffer times out (and my application would just continue waiting on that buffer forever).  When I examine all the overlapped structures for all the buffers at the timeout, I can see that a few buffers have a pending state and 0 bytes transferred and all the others are filled with correct data.  For example, if I am queueing up 512 buffers, I can see that several buffers (which happen to be 14 buffers apart for some odd reason), are stuck with a pending status.  All the other buffers have been filled correctly.  I was under the impression that if a timeout occurs when waiting for the USB buffer, the buffer remains in the queue and the wait can continue (no requeue necessary).  That is how it works with the Intel host controller and my other usb card.  When I saw that, I thought I'd try to see if I could get past the stuck pending buffer because the data in the next buffer is correct.

What I found was that if I cancel the IO that timed out and requeue that buffer and move to the next buffer, it starts working until it reaches the next buffer that is stuck at pending.  I can then cancel that IO and continue.  For some reason, it works for several seconds and then all of a sudden, the same problem occurs but at a different buffer location.  Suffice it to say, cancelling that IO after a timeout slows down the transfers but I can get data to transfer now.

From my observations, I know that the buffer that is stuck on pending is not causing lost data.  I know that because I can number all the incoming 16K packets.  I can see the packet numbers increasing by 1 until I get that stuck pending buffer.  When I skip that buffer, the next buffer has the correct packet number.  It's almost as if something (in the driver?) just randomly skips a buffer (because I can also see that the skipped buffer has old data from the previous time the buffer was filled - but the count is 0 (InternalHigh) (so something reset the count but never put data into the buffer) and as a result that buffer is left in a permanent pending state (but cancelling and requeueing makes the buffer work just fine again).

I also notice that the stuck pending buffers usually appear in groups of 3 with approximately 14 valid buffers between them.  Changing the total buffers does not seem to have an effect on these groupings.  If I use 1 total buffer, all the issues disappear.  I'll continue to  investigate some more.  At least I know why my application is hanging (because an infinite pending buffer hangs my application - although I do not know why the Intel controller does not need a cancel IO but the Asmedia requires a cancel IO).

Frank

0 Likes

Hi Rashi,

This is a summary of my findings.

There is a low level issue with submitting a buffer to a queue via DeviceIOControl() while data is streaming using the 1244A RocketU USB card. I do not know whether the issue is in CYUSB3 or at a lower level.

All my issues go away when I apply a workaround to this problem. I was chasing my tail for quite some time because I thought I had multiple problems but I have concluded there is a single problem.

Description -

The USBBulkSourceSinkLED project saturates the FX3 buffers such that when the PC queues a buffer, the PC buffer gets immediately filled by the FX3. As such, adding the delay in streamer after queueing a PC buffer prevents any overlap from occurring (i.e. queueing and data streaming are isolated in time). This delay fixes the streamer application, although the fundamental problem is masked.

My application does not keep the FX3 buffers saturated. I was suspicious when I noticed that XMODE_DIRECT and XMODE_BUFFERED IO generated errors differently in my application. The problem is more prevalent in XMODE_BUFFERED. I modified streamer to run in both XMODE_DIRECT and XMODE_BUFFERED and found that XMODE_BUFFERED needs a longer delay in streamer.

I then modified the USBBulkSourceSinkLED project to slow down the FX3 buffer filling. When the buffer filling is slowed, the possibility of a PC requeue overlapping data streaming is increased. The slower buffer filling shows up as failures in streamer. This is similar to what happens in my application.

Conclusion -

For my application, I found a workaround. The workaround is more efficient in XMODE_DIRECT.

The problem can be detected in XMODE_DIRECT without waiting for a timeout making the workaround more efficient. The problem is also less likely to occur in XMODE_DIRECT but the problem does occur. In XMODE_BUFFERED, a timeout must occur and the workaround is less efficient. Unfortunately, the problem is more prevalent in XMODE_BUFFERED and with the increased frequency of occurrence and the less efficient workaround, the data streaming is much slower in XMODE_BUFFERED. But in both instances my application can continue running now.

I have attached the modified file for the USBBulkSourceSinkLED project. By  uncommenting the define for SLOW_STREAMING_BY_USING_DELAY, a delay is added in the FX3 after transferring many USB packets causing the issue in streamer. This can be used to cause the issue to occur after all the initial buffer setup in streamer is complete.

I have also attached the modified file for the streamer application. The two defines are setup to use XMODE_DIRECT with the added delays to make streamer work. The different delays for XMODE_DIRECT and XMODE_BUFFERED are shown. I did not apply the workaround to streamer, so streamer may or may not recover automatically.

Someday someone with the correct tools and equipment can debug this issue and find the root cause.

Cypress Driver = 1.2.3.23 dated 11/12/2018
ASmedia driver version = 10.0.19041.1320 dated 10/13/2021

Frank

0 Likes

Hello Frank,

Thank you for the updates.

Please let me know if you were using the latest cyusb3 driver (v1.2.3.20) from the FX3 SDK 1.3.4 or can you please let me know from where are you using cyusb3 (v 1.2.3.23)

Regards,
Rashi
0 Likes

Hi Rashi,

Well, I'm using a cyusb3.sys file that shows as 1.2.3.23 as the version in properties.  I don't remember how I obtained that file.  I have 3 drivers installed that I can switch between.  1.2.3.14, 1.2.3.20 and 1.2.3.23.  The problem happens on all the drivers.

As a further update, I have been running with my workaround in  my visual studio c++ application.  It is working.  However, just the other day, I changed to a Release build (I was running in Debug builds the whole time) and it is failing again.  I can add delays and make it work in a Release  build but I still don't know what's actually wrong yet.  But its another indication that its related to timing somehow.

It sure seems to me that there is some kind of timing conflict (DMA or otherwise) down at the driver level or below that causes the IO to lockup.  The best I can intuitively determine is that if a bulk IN transfer is occurring when the driver is attempting to queue another buffer, a timing issue arises causing lockup.  If the "packets per xfer" is increased greater than 1, the likelihood of occurrence gets very small (because the buffer queueing is shifted away from the packet completion).  If "packets per xfer" is 1 and the "xfers to queue" is 1, the problem goes away (because there is no overlap). 

Frank

0 Likes

Hello Frank,

Thank you for sharing your observations

We would recommend to use cyusb3 1.2.3.20 which is the latest official version.

However, just the other day, I changed to a Release build (I was running in Debug builds the whole time) and it is failing again

>> It seems that timing is the issue as per your observations and usually release build is more sensitive to timing issues

Can you please help me with NTStatus and USBDstatus values when the issue is seen

 

Regards,
Rashi
0 Likes

Hi Rashi,

When the failure occurs, NtStatus=0xC0000001, UsbdStatus=0xC0000012, GetlastError is 31 (ERROR_GEN_FAILURE).  GetOverlappedResults returns False.  The object is signalled when this error occurs.  I verified that the same error occurs in streamer as well as my application.

If you do not recover from that error and keep trying different things (like closing streamer and reopening or pressing stop and then start again), you can get any number of different errors after that until you reset the usb connection.  I typically reset the usb quickly.

I am most interested in fixing the case when "Packets per Xfer = 1" and "Xfers to Queue > 1".  So I primarily only look at this case.  I mentioned earlier that when  "Packets to Xfer > 1" and "Xfers to Queue > 1" the occurrence is very small.  If I open streamer and leave it at default of (32, 16), streamer can run for a longer period of time (sometimes over 10 minutes but mostly less) but in those cases streamer also locks up.  The difference seems to be that those lock ups are generally more severe and diverse.  I've seen streamer report an NtStatus=0xC000000E error.  I've also had several BSOD that say stop code=kernel_security.check_failure or stop code=dpc_watchdog_violation.  I've seen the streamer app just disappear.  Sometimes streamer just reports failures continuously.  The errors in that case are just quite different all the time.  However, I suspect it's all related to the same problem.  It's almost as if a memory pointer goes astray and it's not too bad in the "Packets per Xfer = 1" case but can be disastrous in the "Packets per Xfer > 1" case.

Frank

0 Likes

Hi Frank,

Thank you for your response.

Regarding the XMODE change - The XferMode property controls how data is passed to / from the cyusb3.sys driver andthe default value is set as XMODE direct.

As you mentioned that the errors are different for different cases, its difficult to understand the reason behind the issue.

If possible, can you please share the complete project with the changes in bulksrcsink and streamer application so that it will be useful to other community members. And it would help us to understand how and where adding delay resolves the issue.

Also, can you confirm that after making the changes streamer works with any packets per xfer and xfers per queue supported by streamer

Regards,
Rashi
0 Likes

Hi Rashi,

I previously shared the changes to the projects in the attached file - USBBulkSourceSinkLED_Streamer_SourceModifications.zip.

Since the issue appears to be that you cannot call DeviceIOControl() to queue a packet if a packet is in the process of being transferred, there's no way you could guarantee no further issues to occur.  In the project modifications I made, the delay times need to be changed depending on XMODE and the "Packets per Xfer" setting to minimize the likelihood of occurrence.  The only setting that can work without failure is "Packets per Xfer = 1" and "Transfers to Queue = 1".  Effectively, the added delays are causing the transfers to asymptotically approach the (1, 1) working case.

I suspect that all the different problems have this issue as the single root cause.

Frank

0 Likes

Hello Frank,

Thank you for the details.

From the details shared by you, it seems that the issue is seen when there is overlap while submitting the buffers.

Can you please confirm that the issue is not seen when XferData API (i.e. synchronous IO) is used multiple times to transfer the data instead of asynchronous IO. Is my understanding correct?

I understand that calling XferData is equivalent to the setting "Packets per Xfer = 1" and "Transfers to Queue = 1" . Is that correct?

If yes, the problem is seen only when the host uses asynchronous IO method to transfer data

Regards,
Rashi
0 Likes
Rashi_Vatsa
Moderator
Moderator
Moderator
5 likes given 500 solutions authored 1000 replies posted

Hello Frank,

Please refer to a similar thread  Solved: Bulk In data streaming hangs on CYUSB3KIT-001 DevK... - Infineon Developer Community  

The issue doesn't seem to be caused by the cyusb3 driver

Regards,
Rashi
0 Likes