This is a repeat of a question previously asked and incorrectly answered and locked so I'll try again. Looking to understand via a deterministic (DMA) and non-deterministic (Software) way how many clock cycles are required to move a Block of memory from Flash to RAM, what is the optimum Block size for the best efficiency would also be a good data point. I understand that the SYSCLK speed will affect the time to transfer the data but SYSCLK multiplied by the number of clocks cycles will give me a good estimate of how long it will take.
Thank you for your time.
We have contacted our internal team to check whether we have any math calculation on the number of clock cycles required for this transfer. We will let you know at the earliest.
Thank you, seems like a fairly basic characteristic that needs to be understood to determine a benchmark performance since moving data to and from Flash is a common operation. Is this not needed for any OTA update or boot loading operation in and out of the device.
Look forward to learning more.
I'll try my best to explain.
The Flash is operating at 33MHz, so when your CPU is running with a faster Clock than that there has to be wait cycles added to the read operations. To compensate these wait states:
- the Flash is accessed with 128bit width each read operation
- For repeated access to the Flash (running the program) there are two 8KiB caches (one per core) added. Thus loading your program loop that is copying the data should only effected by these waitstates during the first pass, when it is loaded to the cache. After that it is executed with zero wait states from cache instead of flash.
Here are the wait states/cycles for the Flash for different HF_CLK speeds.
Numbers are from the Architecture TRM page 101 (https://www.cypress.com/file/399201/download)
For the DMA it depends on your setup, but basically you have setup times for loading the descriptors (12/13 cycles) and then 3 cycles for each data package within the DMA loop. Assuming you won't use single transfers (14 cycles per transfer).
But in addition there will be possible wait states that are caused by the CPU accessing the Flash in parallel (if data was not cached), this is caused because the CPU has higher priority, so the DMA cant cause CPU stalls by blocking the bus.
Regarding the DMA performance there is a good chapter in the Architecture TRM on page 85 (https://www.cypress.com/file/399201/download)
Thank you Achim, this is the level of answer we were looking for.
So DMA is looking to be the fastest way to go assuming a single thread focus and limiting interrupts we should be able to use the DMA to do a burst transfer from Flash to RAM of 13 cycles for setup and 3 cycles per data package in our case 32b word. For a 64 Lword transfer could be as fast as 205 sysclk cycles is this correct?