PSoC5 DMA Performance, cannot reach the specified performance

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
UrPl_1236626
Level 4
Level 4
10 likes given First solution authored 50 replies posted

Hi,

attached is a simple project to test the DMA inter-spoke performance between SRAM and UDB (SPI) block.

For a MASTER_CLK = BUS_CLK = 78MHz obtained results, measured DMA data transfer time (t_DMA_DONE - t_DMA_TRIGGER), vs. expected are:

  • transfer 1 * 16bit: 146 ns, calculation: Nbursts = 1, InterSpoke transfer = Nbursts + 7 = 8 (102 ns)
  • transfer 2 * 16bit: 250 ns, calculation: Nbursts = 2, InterSpoke transfer = Nbursts + 7 = 9 (115 ns)
  • transfer 4 * 16bit: 457 ns, calculation: Nbursts = 4, InterSpoke transfer = Nbursts + 7 = 11 (141 ns)

Can anybody advise how to reach the speed as specified in the data-sheet?

This particular program has main loop doing nothing, in real example DMA arbiter priority was raised against the CPU however no significant effect has been observed.

Kind regards,
Uros

0 Likes
1 Solution
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Uros,

I ran your project with the following function values listed below.

I also do not get the calculated values predicted but it appears to be faster than what you are getting.

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 2, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
146 ns 118 ns 102 ns

PRINT_01.BMP

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 4, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
250 ns 145 ns 115 ns

PRINT_02.BMP

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 8, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
457 ns 195 ns 141 ns

PRINT_03.BMP

Len
"Engineering is an Art. The Art of Compromise."

View solution in original post

7 Replies
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Uros,

I'm studying your specific request.

In the meantime, here is a link to a another project that does SRAM to SRAM moves: DMA-driven-memset-memcpy-and-memmove-functions 

The project allows you to compare mass memory operations using CPU and DMA.   The general result:  DMA is significantly faster as long as the number of bytes transferred in one DMA operation are larger.

Maybe this project can help shed some light on your issue.

Len
"Engineering is an Art. The Art of Compromise."
0 Likes

Hi Len,

it is a nice example (the memcpy), however the project is about UDB <-> mem and UDB <-> UDB transfers, and the timings as per data-sheet do not fit.

Also it is unclear how to write 32-bit into 32-bit UDB (over 16-bit spoke, as data-sheet says DMA will correctly handle different bus widths), it only worked with two 16-bit transfers.

BR Uros

0 Likes

BR Uros,

I've searched the PSoC5LP datasheet.  I couldn't find anywhere where it is stated a DMA transfer rate.  Can you supply the section where I can find this transfer rate quoted?

The link I provided to the mem functions that use DMA to accomplish is to illustrate that to maximize the DMA transfer rate you need to increase the number of bytes to transfer.

This is because the following factors determine the maximum transfer rate

  • CPU clocking speed.  Higher frequency => faster.
  • size of PHUB spoke transfer.  32b to 32b fastest.  32b to 16b needs two DMA cycles.
  • type of PHUB spoke transfer.  RAM is fastest. Peripheral is slower.
  • DMA priority.  This is important when you have multiple DMA channels that could occur at the time.

Note:  Every DMA event has specific CPU cycles that need to be executed for the DMA state machine to determine the SRC and DEST addresses and the size of the transfer.  This "overhead" is fixed.  This is why using FIFOs and buffers are available to help improve transfer rates.

Len
"Engineering is an Art. The Art of Compromise."
0 Likes
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Uros,

I ran your project with the following function values listed below.

I also do not get the calculated values predicted but it appears to be faster than what you are getting.

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 2, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
146 ns 118 ns 102 ns

PRINT_01.BMP

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 4, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
250 ns 145 ns 115 ns

PRINT_02.BMP

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 8, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
457 ns 195 ns 141 ns

PRINT_03.BMP

Len
"Engineering is an Art. The Art of Compromise."
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Uros,

I tried another experiment.

All the operating parameters in my previous post are identical.

The difference in this experiment is I reduced the SPI input clock from 40MHz to 1MHz.

Here's my new data:

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 2, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
146 ns 92.4 ns 102 ns

 

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 4, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
250 ns 115 ns 115 ns

 

DMA_Chan = DMA_SPI_DmaInitialize(0, 1, HI16(CYDEV_SRAM_BASE), HI16(CYDEV_PERIPH_BASE) );
CyDmaTdSetConfiguration(DMA_TD, 8, DMA_TD, DMA_SPI__TD_TERMOUT_EN | TD_INC_SRC_ADR);

Your measured value My measured value Calculated value
457 ns 167 ns 141 ns

 

You can see that by slowing down the SPIM input clock, I can get nearly the same values for DMA transfer as calculated.

Len
"Engineering is an Art. The Art of Compromise."
UrPl_1236626
Level 4
Level 4
10 likes given First solution authored 50 replies posted

Hi Len,

Thank you for your effort.

  • Changing the burstCount to zero (If this value is zero, each transfer is done as a single burst.) and requestPerBurst to 1 helps, I do get the same results as you did (118ns for 16bit transfer...).
  • Changing the SPI input clock from 40MHz to 1MHz has no effect on DMA speed on my side.

Shortly the tests with memory will be redone.

Kind regards,
Uros

0 Likes