32-bit DMA vs ISR for datapath reads.

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
greg_duckworth
Level 1
Level 1
5 sign-ins First reply posted First like given

Hi All,

I have been on a very deep dive into PSOC5LP, verilog, datapaths and all.  I believe I have finally got a working example but would appreciate some sanity checking:

I am wanting to implement a 32-bit shift register capable of reading bursts at up to 10MHz.  

I know this is possible (I can demonstrate it working up to 20+ MHz) but I have run into several complexities along the way.

I implemented a shift register using a datapath within the UBD editor.  It uses the A0 register as the buffer and the A1 register as the captured state every 32 clocks.  This allows a small (and variable) delay between the hardware capturing and the DMA/software reading to occur without missing data.

The issue I was having is the differences between DMA and the Interrupt reading of the registers:

When using software to read the shift register, the only line that is needed is:

 

 

sample = CY_GET_REG32(Buffered_SR_32b_udb_1_Datapath_1_A1_PTR);

 

 

This reads the entire 32-bit A1 register in 1 line (likely not 1 operation though).  

If instead I try to do the same thing with DMA, using the same pointer, the resultant 4 bytes are 4 copies of the first byte of the A1 register.

To be able to read the appropriate registers, I have to set up 4 DMA handles as follows:

 

 

    CyDmaTdSetConfiguration(DMA_1_TD[0], 1, DMA_1_TD[1], CY_DMA_TD_INC_DST_ADR | CY_DMA_TD_AUTO_EXEC_NEXT);
    CyDmaTdSetConfiguration(DMA_1_TD[1], 1, DMA_1_TD[2], CY_DMA_TD_INC_DST_ADR | CY_DMA_TD_AUTO_EXEC_NEXT);
    CyDmaTdSetConfiguration(DMA_1_TD[2], 1, DMA_1_TD[3], CY_DMA_TD_INC_DST_ADR | CY_DMA_TD_AUTO_EXEC_NEXT);
    CyDmaTdSetConfiguration(DMA_1_TD[3], 1, DMA_1_TD[0], DMA_1__TD_TERMOUT_EN | CY_DMA_TD_INC_DST_ADR);
    CyDmaTdSetAddress(DMA_1_TD[0], LO16((uint32)Buffered_SR_32b_udb_1_BYTE0_A1_REG), LO16((uint32)&USB_bufferA[3]));
    CyDmaTdSetAddress(DMA_1_TD[1], LO16((uint32)Buffered_SR_32b_udb_1_BYTE1_A1_REG), LO16((uint32)&USB_bufferA[2]));
    CyDmaTdSetAddress(DMA_1_TD[2], LO16((uint32)Buffered_SR_32b_udb_1_BYTE2_A1_REG), LO16((uint32)&USB_bufferA[1]));
    CyDmaTdSetAddress(DMA_1_TD[3], LO16((uint32)Buffered_SR_32b_udb_1_BYTE3_A1_REG), LO16((uint32)&USB_bufferA[0]));

 

 

which references the 4 A1 registers of the component separately.  On triggering, the DMA reads 4 bytes, one-by-one, into the USB_bufferA register.

Is there a better way to do this?  Is it possible to read all 32-bits with one DMA request? 

 

Also: I would be able to spend more time between reads if I were able to use the F0 and F1 FIFO's so that I could buffer 4 lots of 32-bit numbers before needing to read them back.  I am currently unable to make this work as the FIFOs appear to be only 1 word deep, instead of 4.  Has anyone got examples of reading back from the output FIFO of a datapath?

With any luck I will be putting out a few shift register examples soon.

 

Thanks all,

Greg

0 Likes
1 Solution
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Greg,

Welcome to the fascinating world of component creation!

From my own experience this can be rewarding and frustrating at the same time!

The frustrating part is that usually the information about the issues you are seeing are published ... somewhere.

Cypress created a few documents to aid in component design but due to the complexity of this endeavor, no one document is fully comprehensive.

Enough extemporizing.    Now to try to answer your question...

The UDBs resource in memory space that uses a 16-bit PHUB.

When you use ISRs to transfer data as a 32-bit UDB operation, the ARM internal architecture internally performs a 4 one byte operations to the register you are trying to access.  Not exactly atomic but it appears the ARM compensates for this.

Using DMA, you have to be aware of the width of the PHUB to the resource you're trying to access.  

The UDB registers  are in the 'Peripheral' resource memory section which has at most a 16-bit PHUB.

This is where it gets a bit complicated.

If you look at the "<component_name>_defs.h" created by the UDB Editor, it has many defines that eventually point to multiple HW addresses to what appears to be the same registers ... but it not.

Here's an example from one of my UDB Editor 16-bit components:

 

 

#define UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0
#define UDBramp_Ramp_dp__16BIT_A1_REG CYREG_B1_UDB04_05_A1
#define UDBramp_Ramp_dp__16BIT_D0_REG CYREG_B1_UDB04_05_D0
#define UDBramp_Ramp_dp__16BIT_D1_REG CYREG_B1_UDB04_05_D1
#define UDBramp_Ramp_dp__16BIT_DP_AUX_CTL_REG CYREG_B1_UDB04_05_ACTL
#define UDBramp_Ramp_dp__16BIT_F0_REG CYREG_B1_UDB04_05_F0
#define UDBramp_Ramp_dp__16BIT_F1_REG CYREG_B1_UDB04_05_F1
#define UDBramp_Ramp_dp__A0_A1_REG CYREG_B1_UDB04_A0_A1
#define UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0
#define UDBramp_Ramp_dp__A1_REG CYREG_B1_UDB04_A1
#define UDBramp_Ramp_dp__D0_D1_REG CYREG_B1_UDB04_D0_D1
#define UDBramp_Ramp_dp__D0_REG CYREG_B1_UDB04_D0
#define UDBramp_Ramp_dp__D1_REG CYREG_B1_UDB04_D1
#define UDBramp_Ramp_dp__DP_AUX_CTL_REG CYREG_B1_UDB04_ACTL
#define UDBramp_Ramp_dp__F0_F1_REG CYREG_B1_UDB04_F0_F1
#define UDBramp_Ramp_dp__F0_REG CYREG_B1_UDB04_F0
#define UDBramp_Ramp_dp__F1_REG CYREG_B1_UDB04_F1
#define UDBramp_Ramp_dp__MSK_DP_AUX_CTL_REG CYREG_B1_UDB04_MSK_ACTL
#define UDBramp_Ramp_dp__PER_DP_AUX_CTL_REG CYREG_B1_UDB04_MSK_ACTL
#define UDBramp_Ramp_dp_MSB__16BIT_A0_REG CYREG_B1_UDB05_06_A0
#define UDBramp_Ramp_dp_MSB__16BIT_A1_REG CYREG_B1_UDB05_06_A1
#define UDBramp_Ramp_dp_MSB__16BIT_D0_REG CYREG_B1_UDB05_06_D0
#define UDBramp_Ramp_dp_MSB__16BIT_D1_REG CYREG_B1_UDB05_06_D1
#define UDBramp_Ramp_dp_MSB__16BIT_DP_AUX_CTL_REG CYREG_B1_UDB05_06_ACTL
#define UDBramp_Ramp_dp_MSB__16BIT_F0_REG CYREG_B1_UDB05_06_F0
#define UDBramp_Ramp_dp_MSB__16BIT_F1_REG CYREG_B1_UDB05_06_F1
#define UDBramp_Ramp_dp_MSB__A0_A1_REG CYREG_B1_UDB05_A0_A1
#define UDBramp_Ramp_dp_MSB__A0_REG CYREG_B1_UDB05_A0
#define UDBramp_Ramp_dp_MSB__A1_REG CYREG_B1_UDB05_A1
#define UDBramp_Ramp_dp_MSB__D0_D1_REG CYREG_B1_UDB05_D0_D1
#define UDBramp_Ramp_dp_MSB__D0_REG CYREG_B1_UDB05_D0
#define UDBramp_Ramp_dp_MSB__D1_REG CYREG_B1_UDB05_D1
#define UDBramp_Ramp_dp_MSB__DP_AUX_CTL_REG CYREG_B1_UDB05_ACTL
#define UDBramp_Ramp_dp_MSB__F0_F1_REG CYREG_B1_UDB05_F0_F1
#define UDBramp_Ramp_dp_MSB__F0_REG CYREG_B1_UDB05_F0
#define UDBramp_Ramp_dp_MSB__F1_REG CYREG_B1_UDB05_F1
#define UDBramp_Ramp_dp_MSB__MSK_DP_AUX_CTL_REG CYREG_B1_UDB05_MSK_ACTL
#define UDBramp_Ramp_dp_MSB__PER_DP_AUX_CTL_REG CYREG_B1_UDB05_MSK_ACTL

 

 

 You'll notice there are two references to what appear to be the same A0 register.

  • UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0
  •  UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0

They are and they're not.   The register address of UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0 points to 0x40006a08u  and UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0 points to 0x40006504u.

As it turns out, the UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0 register is accessible as 8-bit.   This commonly used by the CPU in an ISR operation to pull the register data to the size of the Datapath semi-atomically.

However, using the DMA to this address requires that you perform 4 byte operations to consequential addresses to acquire the full Datapath data.   Not as atomic as the CPU.

Using the UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0 register will allow 16-bit atomic access to the Datapath register data which is the full PHUB width for this resource.  This will allow a single DMA TD operation if <= 16-bits or two DMA TDs if >16-bits.  This is your most efficient method for DMA (ie 2 TD passes for 32-bits).

I had to delve into the Architecture and the Register TRMs to understand this dual-register relationship.

I've ran into this very issue when creating my DCmp component.   DCmp-component-Very-fast-Digital-Comparisons-with-additional-features 

I learned by pouring over the documents available and piecing together the information.  I also performed many controlled experiments to reach what I believe is the optimum solution.

You'll notice in my Demo project for the component, that my DMA initialization has multiple conditional compiles depending on the allocated size of the Datapath for the component.

As I indicated earlier:  Rewarding and frustrating simultaneously.

I have to applaud Cypress for their insight in creating such a sophisticated part and tool (PSoC Creator).

Coding components are not for the 'faint of heart'.   

Happy coding!!!

Len
"Engineering is an Art. The Art of Compromise."

View solution in original post

4 Replies
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Greg,

Welcome to the fascinating world of component creation!

From my own experience this can be rewarding and frustrating at the same time!

The frustrating part is that usually the information about the issues you are seeing are published ... somewhere.

Cypress created a few documents to aid in component design but due to the complexity of this endeavor, no one document is fully comprehensive.

Enough extemporizing.    Now to try to answer your question...

The UDBs resource in memory space that uses a 16-bit PHUB.

When you use ISRs to transfer data as a 32-bit UDB operation, the ARM internal architecture internally performs a 4 one byte operations to the register you are trying to access.  Not exactly atomic but it appears the ARM compensates for this.

Using DMA, you have to be aware of the width of the PHUB to the resource you're trying to access.  

The UDB registers  are in the 'Peripheral' resource memory section which has at most a 16-bit PHUB.

This is where it gets a bit complicated.

If you look at the "<component_name>_defs.h" created by the UDB Editor, it has many defines that eventually point to multiple HW addresses to what appears to be the same registers ... but it not.

Here's an example from one of my UDB Editor 16-bit components:

 

 

#define UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0
#define UDBramp_Ramp_dp__16BIT_A1_REG CYREG_B1_UDB04_05_A1
#define UDBramp_Ramp_dp__16BIT_D0_REG CYREG_B1_UDB04_05_D0
#define UDBramp_Ramp_dp__16BIT_D1_REG CYREG_B1_UDB04_05_D1
#define UDBramp_Ramp_dp__16BIT_DP_AUX_CTL_REG CYREG_B1_UDB04_05_ACTL
#define UDBramp_Ramp_dp__16BIT_F0_REG CYREG_B1_UDB04_05_F0
#define UDBramp_Ramp_dp__16BIT_F1_REG CYREG_B1_UDB04_05_F1
#define UDBramp_Ramp_dp__A0_A1_REG CYREG_B1_UDB04_A0_A1
#define UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0
#define UDBramp_Ramp_dp__A1_REG CYREG_B1_UDB04_A1
#define UDBramp_Ramp_dp__D0_D1_REG CYREG_B1_UDB04_D0_D1
#define UDBramp_Ramp_dp__D0_REG CYREG_B1_UDB04_D0
#define UDBramp_Ramp_dp__D1_REG CYREG_B1_UDB04_D1
#define UDBramp_Ramp_dp__DP_AUX_CTL_REG CYREG_B1_UDB04_ACTL
#define UDBramp_Ramp_dp__F0_F1_REG CYREG_B1_UDB04_F0_F1
#define UDBramp_Ramp_dp__F0_REG CYREG_B1_UDB04_F0
#define UDBramp_Ramp_dp__F1_REG CYREG_B1_UDB04_F1
#define UDBramp_Ramp_dp__MSK_DP_AUX_CTL_REG CYREG_B1_UDB04_MSK_ACTL
#define UDBramp_Ramp_dp__PER_DP_AUX_CTL_REG CYREG_B1_UDB04_MSK_ACTL
#define UDBramp_Ramp_dp_MSB__16BIT_A0_REG CYREG_B1_UDB05_06_A0
#define UDBramp_Ramp_dp_MSB__16BIT_A1_REG CYREG_B1_UDB05_06_A1
#define UDBramp_Ramp_dp_MSB__16BIT_D0_REG CYREG_B1_UDB05_06_D0
#define UDBramp_Ramp_dp_MSB__16BIT_D1_REG CYREG_B1_UDB05_06_D1
#define UDBramp_Ramp_dp_MSB__16BIT_DP_AUX_CTL_REG CYREG_B1_UDB05_06_ACTL
#define UDBramp_Ramp_dp_MSB__16BIT_F0_REG CYREG_B1_UDB05_06_F0
#define UDBramp_Ramp_dp_MSB__16BIT_F1_REG CYREG_B1_UDB05_06_F1
#define UDBramp_Ramp_dp_MSB__A0_A1_REG CYREG_B1_UDB05_A0_A1
#define UDBramp_Ramp_dp_MSB__A0_REG CYREG_B1_UDB05_A0
#define UDBramp_Ramp_dp_MSB__A1_REG CYREG_B1_UDB05_A1
#define UDBramp_Ramp_dp_MSB__D0_D1_REG CYREG_B1_UDB05_D0_D1
#define UDBramp_Ramp_dp_MSB__D0_REG CYREG_B1_UDB05_D0
#define UDBramp_Ramp_dp_MSB__D1_REG CYREG_B1_UDB05_D1
#define UDBramp_Ramp_dp_MSB__DP_AUX_CTL_REG CYREG_B1_UDB05_ACTL
#define UDBramp_Ramp_dp_MSB__F0_F1_REG CYREG_B1_UDB05_F0_F1
#define UDBramp_Ramp_dp_MSB__F0_REG CYREG_B1_UDB05_F0
#define UDBramp_Ramp_dp_MSB__F1_REG CYREG_B1_UDB05_F1
#define UDBramp_Ramp_dp_MSB__MSK_DP_AUX_CTL_REG CYREG_B1_UDB05_MSK_ACTL
#define UDBramp_Ramp_dp_MSB__PER_DP_AUX_CTL_REG CYREG_B1_UDB05_MSK_ACTL

 

 

 You'll notice there are two references to what appear to be the same A0 register.

  • UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0
  •  UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0

They are and they're not.   The register address of UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0 points to 0x40006a08u  and UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0 points to 0x40006504u.

As it turns out, the UDBramp_Ramp_dp__A0_REG CYREG_B1_UDB04_A0 register is accessible as 8-bit.   This commonly used by the CPU in an ISR operation to pull the register data to the size of the Datapath semi-atomically.

However, using the DMA to this address requires that you perform 4 byte operations to consequential addresses to acquire the full Datapath data.   Not as atomic as the CPU.

Using the UDBramp_Ramp_dp__16BIT_A0_REG CYREG_B1_UDB04_05_A0 register will allow 16-bit atomic access to the Datapath register data which is the full PHUB width for this resource.  This will allow a single DMA TD operation if <= 16-bits or two DMA TDs if >16-bits.  This is your most efficient method for DMA (ie 2 TD passes for 32-bits).

I had to delve into the Architecture and the Register TRMs to understand this dual-register relationship.

I've ran into this very issue when creating my DCmp component.   DCmp-component-Very-fast-Digital-Comparisons-with-additional-features 

I learned by pouring over the documents available and piecing together the information.  I also performed many controlled experiments to reach what I believe is the optimum solution.

You'll notice in my Demo project for the component, that my DMA initialization has multiple conditional compiles depending on the allocated size of the Datapath for the component.

As I indicated earlier:  Rewarding and frustrating simultaneously.

I have to applaud Cypress for their insight in creating such a sophisticated part and tool (PSoC Creator).

Coding components are not for the 'faint of heart'.   

Happy coding!!!

Len
"Engineering is an Art. The Art of Compromise."
greg_duckworth
Level 1
Level 1
5 sign-ins First reply posted First like given

Thank you for such a detailed reply, I completely agree with you about component creation being both rewarding and frustrating at the same time!  I am fortunate that my work are happy for me to spend several weeks on this project as it was only meant to be 5 days total (Whoops).  Still, I feel I have learned enough for it to be useful in the future. 

Thank you for the links to the demo projects, I had started to suspect there was some dual-register shenanigans but it was difficult to find any documentation on the topic.  

Its good to know that there are reasons for all these weird occurrences, and I absolutely agree that Cypress did a hell of a job at making something so special.

 

Cheers

 

Greg

0 Likes
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Greg,

About custom components and DMA requirements ...

There is something about a DMA capability descriptor file for components.

Len_CONSULTRON_0-1650377733107.png

The file can be useful to the DMA wizard tool for aiding in constructing a useable DMA initialization configuration.

The field description is found in the "Component Authoring Guide" Section 7.4.

I think I might use it going forward.   I believe you still have to use the info I provided earlier but this might aid the user (other than yourself) in using the component.

 

Len
"Engineering is an Art. The Art of Compromise."

Greg,

Did you try any standard components to capture bitstream, e.g. ShiftRegister or SPI. I believe that that the bit rate is quite low (1Mbs), so even 8-bit Datapath should be able to handle this.

It is better to write data to FIFO, which is 4 bytes deep (instead of A1, which is only one byte for 8-bit DP). Then 8-bit DP should be sufficient to hold 4 bytes, and transfer it in a single boost of 4 bytes to RAM. The setting up time of the DMA is consuming clocks, while sending a train of the 4 bytes adds only 3 extra clocks (unlike creating 4  consecutive TDs). Also, DMA can be configured for level (not rising edge) to perform transfer until FIFO is cleared. I recommend to look into FIFOin demo component by Brad Budlong for details

PSoC5 16 Bit DMA from GPIO to memory.