- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
attached is a simple project to test the DMA inter-spoke performance between SRAM and UDB (SPI) block.
For a MASTER_CLK = BUS_CLK = 78MHz obtained results, measured DMA data transfer time (t_DMA_DONE - t_DMA_TRIGGER), vs. expected are:
- transfer 1 * 16bit: 146 ns, calculation: Nbursts = 1, InterSpoke transfer = Nbursts + 7 = 8 (102 ns)
- transfer 2 * 16bit: 250 ns, calculation: Nbursts = 2, InterSpoke transfer = Nbursts + 7 = 9 (115 ns)
- transfer 4 * 16bit: 457 ns, calculation: Nbursts = 4, InterSpoke transfer = Nbursts + 7 = 11 (141 ns)
Can anybody advise how to reach the speed as specified in the data-sheet?
This particular program has main loop doing nothing, in real example DMA arbiter priority was raised against the CPU however no significant effect has been observed.
Kind regards,
Uros
Solved! Go to Solution.
- Labels:
-
PSOC5 LP MCU
- Tags:
- dma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Uros,
I ran your project with the following function values listed below.
I also do not get the calculated values predicted but it appears to be faster than what you are getting.
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 2, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
146 ns | 118 ns | 102 ns |
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 4, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
250 ns | 145 ns | 115 ns |
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 8, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
457 ns | 195 ns | 141 ns |
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Uros,
I'm studying your specific request.
In the meantime, here is a link to a another project that does SRAM to SRAM moves: DMA-driven-memset-memcpy-and-memmove-functions
The project allows you to compare mass memory operations using CPU and DMA. The general result: DMA is significantly faster as long as the number of bytes transferred in one DMA operation are larger.
Maybe this project can help shed some light on your issue.
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Len,
it is a nice example (the memcpy), however the project is about UDB <-> mem and UDB <-> UDB transfers, and the timings as per data-sheet do not fit.
Also it is unclear how to write 32-bit into 32-bit UDB (over 16-bit spoke, as data-sheet says DMA will correctly handle different bus widths), it only worked with two 16-bit transfers.
BR Uros
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BR Uros,
I've searched the PSoC5LP datasheet. I couldn't find anywhere where it is stated a DMA transfer rate. Can you supply the section where I can find this transfer rate quoted?
The link I provided to the mem functions that use DMA to accomplish is to illustrate that to maximize the DMA transfer rate you need to increase the number of bytes to transfer.
This is because the following factors determine the maximum transfer rate
- CPU clocking speed. Higher frequency => faster.
- size of PHUB spoke transfer. 32b to 32b fastest. 32b to 16b needs two DMA cycles.
- type of PHUB spoke transfer. RAM is fastest. Peripheral is slower.
- DMA priority. This is important when you have multiple DMA channels that could occur at the time.
Note: Every DMA event has specific CPU cycles that need to be executed for the DMA state machine to determine the SRC and DEST addresses and the size of the transfer. This "overhead" is fixed. This is why using FIFOs and buffers are available to help improve transfer rates.
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In this document, page 9
Your other 4 bullets -> of course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Uros,
I ran your project with the following function values listed below.
I also do not get the calculated values predicted but it appears to be faster than what you are getting.
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 2, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
146 ns | 118 ns | 102 ns |
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 4, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
250 ns | 145 ns | 115 ns |
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 8, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
457 ns | 195 ns | 141 ns |
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Uros,
I tried another experiment.
All the operating parameters in my previous post are identical.
The difference in this experiment is I reduced the SPI input clock from 40MHz to 1MHz.
Here's my new data:
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 2, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
146 ns | 92.4 ns | 102 ns |
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 4, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
250 ns | 115 ns | 115 ns |
DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 8, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);
Your measured value | My measured value | Calculated value |
457 ns | 167 ns | 141 ns |
You can see that by slowing down the SPIM input clock, I can get nearly the same values for DMA transfer as calculated.
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Len,
Thank you for your effort.
- Changing the burstCount to zero (If this value is zero, each transfer is done as a single burst.) and requestPerBurst to 1 helps, I do get the same results as you did (118ns for 16bit transfer...).
- Changing the SPI input clock from 40MHz to 1MHz has no effect on DMA speed on my side.
Shortly the tests with memory will be redone.
Kind regards,
Uros