PSoC5 16 Bit DMA from GPIO to memory.

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Mikers
Level 1
Level 1
First solution authored First reply posted First question asked

Hi,

I'm trying to create a PSoC 58LP888 project with a parallel asynchronous input.

The host will signal on CS* and transfer 9 data bits  on the rising edge of WE*. 

I need to transfer these 9 data bits to memory using a DMA.

It seems to be pretty easy if you only want to use 8 bits. Just configure component cy_pins_v2_20 as an  8 bit wide input  pin called "Parallel_Input". Under mapping, ensure "Contiguous" is selected and use the following in the DMA config:

CyDmaTdSetAddress(dmaTD[0], LO16((uint32)&Parallel_Input_PS), ...

 

Parallel_input_PS is defined in Generated_Source/PSoC5/Parallel_Input/Parallel_input.c

 

However, you can't use this technique with anything greater than a 9 bit input port. As soon as you go greater than 8 bits, the contiguous box must be unchecked and the .c and .h files disappear fropm the gerenated source.

 

I can't find any other 16 bit component that I can connect to GPIO pins and use to DMA into memory.

How can I get over this simple hurdle? I know the DMA can handle 16 bit wide transfers, it should be "just" a matter of getting the data from the IO pins into a 16 bit wide *thing* that I can point the DMA at.

 

Thing is, I'm "just" stumped!

Any ideas please?

0 Likes
1 Solution
Mikers
Level 1
Level 1
First solution authored First reply posted First question asked

Here's the full solution:

Schem.png

Yeah, I fibbed. The display has 2 Chip selects. It looks like each CS drives 1/2 of the display. I need to capture the two streams of data from the bus.

UDBClkEn_1 passes WE_L as a clock to SR1/SR2 whenever CS1_L or CS2_L are active.

SR1 and SR 2 are both set to sticky.

There are two DMAs. Both point to the same source address ( the 16 bit SR), but each DMA feeds a different destination address buffer.

The DMAs are set to trigger on a rising edge.  The case I want to trigger on is WE_L transition from low to high, whilst CS is low.

The DMA will also trigger if CS_L transitions from low to high whilst WE_L is low or if both CS_L and WE_L transition low to high (in the same BUS_CLK). But since I've already acsertained that this doesn't happen, I can ignore it.

 

In directives, I fixed the UDB locations for SR_1 & SR_2 as @RodolfoGL suggested:

directives.png

I built and checked cy_fitter.h:

 

/* SR_1 */
#define SR_1_sts_sts_reg__STATUS_REG CYREG_B0_UDB00_ST

/* SR_2 */
#define SR_2_sts_sts_reg__STATUS_REG CYREG_B0_UDB01_ST

 

& sure enough in the catchily named cydevice_trm.h:

 

#define CYREG_B0_UDB00_ST 0x40006460u
#define CYREG_B0_UDB01_ST 0x40006461u

 

Finally, here is the config for DMA_CS1:

    uint16 *Buffer_CS1[100]; 
    /* Defines for DMA_CS1 */
    #define DMA_CS1_BYTES_PER_BURST 2
    #define DMA_CS1_REQUEST_PER_BURST 1
    #define DMA_CS1_SRC_BASE HI16((uint32)CYREG_B0_UDB00_ST)
    #define DMA_CS1_DST_BASE (CYDEV_SRAM_BASE)

    /* Variable declarations for DMA_CS1 */
    /* Move these variable declarations to the top of the function */
    uint8 DMA_CS1_Chan;
    uint8 DMA_CS1_TD[1];

    /* DMA Configuration for DMA_CS1 */
    DMA_CS1_Chan = DMA_CS1_DmaInitialize(DMA_CS1_BYTES_PER_BURST, DMA_CS1_REQUEST_PER_BURST, HI16(DMA_CS1_SRC_BASE), HI16(DMA_CS1_DST_BASE));
    DMA_CS1_TD[0] = CyDmaTdAllocate();
    CyDmaTdSetConfiguration(DMA_CS1_TD[0], 100, CY_DMA_DISABLE_TD, CY_DMA_TD_INC_DST_ADR);
    CyDmaTdSetAddress(DMA_CS1_TD[0], LO16((uint32)CYREG_B0_UDB00_ST), LO16((uint32)Buffer_CS1));
    CyDmaChSetInitialTd(DMA_CS1_Chan, DMA_CS1_TD[0]);
    CyDmaChEnable(DMA_CS1_Chan, 1);

I think that's all I need to do to get a 16bit DMA. The source address is word aligned, the addresses used have >16 bit spokes & the bytes per transfer is set to 2.

I'll try it out when I'm next in the office & update the thread.

View solution in original post

0 Likes
12 Replies
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

Mikers,

The simple solution to your question is on the rising edge of WE*, perform TWO port reads.  The ports don't even have to be contiguous ports in register memory.

This is totally and easily possible because the PSoC5 DMA allows for chained TDs.  Therefore, if you trigger the WE*, you can move first the LSB port and then the MSB port to RAM.  (Note: The ARM is little-endian)

The only issue is that assuming the 9-th bit of data is in the lsb of the MSB, what is the value of bits 1 thru 7 of the MSB?

They can be anything as long as you mask out only the lowest 9 bits of the int16 you DMA into RAM.  Or you can assign bits 1 thru 9 as inputs with pulldowns.  There are other solutions as well.

The other potential issue is that the data on the 9-bits of port data must be stable BEFORE the rising edge of WE* (setup time) and long enough to allow for the DMA to move BOTH port bytes to RAM.

Is the rising edge of WE* the leading edge of the signal or trailing edge?

Len
"Engineering is an Art. The Art of Compromise."
0 Likes
Mikers
Level 1
Level 1
First solution authored First reply posted First question asked

That's useful, thanks.You're forcing me to rethink stuff I tried and abandonned! 🙂

WE* is the trailing edge, I'm reverse engineering a display protocol, so I don't know what the typical hold time for data is after WE*.

I did consider a chained TD, but I would prefer a simple atomic transfer. if it's possible.

Better than using a chained TD to grab from two GPIO registers would be would be to latch the data (on rising WE*) in two 8 bit *latches* (D-types or status registers, or, or...) Then use a chained TD, but I can't seem to work out how those may be memory mapped either. This is still an atomic grab of the bus data, it's just the DMA is slower.

 

A Chained TD is probably fast enough, looks like the bus is running at about 4 MHz. If a single 8 bit DMA takes 5(?) cycles, then 10 cycles at 68MHz means I can deal with incoming data at 6.8MHz.

 

Any pointers on DMAing from a latch please? I did try to work out how to use a status register set to sticky and clocked from WE*, but couldn't see how the data was mapped into the address space.

 

I don't care about bits 8-15. I'm actually trying to grab D[7..0] and a register select pin. When I get this data into memory, I need to intretpret it and eventually translate  the stream so that I can replace the old display with something that works. But that's not really relevant to the problem right now!

0 Likes

Mikers,

Here's a schematic of your design goals using Digital input pins routed to a couple of Status registers.  The status register is set to 'Sticky' mode to allow the hold time needed.

Len_CONSULTRON_0-1645478837308.png

Len
"Engineering is an Art. The Art of Compromise."
0 Likes

Miker,

The TopDesign schematic in the previous post won't build (Oops!)  I have to add a Sync component between nWE and the Status Register clocks.  Here's a new one.

Len_CONSULTRON_1-1645535764311.png

 

The advantage of this design over the ones my esteemed colleagues are proposing to allow for one 16-bit DMA read is that the Status registers (SRs) are configured as 'Sticky'  This means the data is latched in the SRs on the clocking of the trailing edge of nWE (WE*). 

The setup time of the data is just the propagation delay through the Pin_9_bit GPIO pins (<5ns assuming you're using 'Transparent' input mode and not the 'Sync' modes.)

The hold time of the data is just one GPIO propagation delay (for nWE)  plus BUS_CLK period due to the Sync component.

Since the data is latched into the SRs with minimum hold time the DMA operation can take its "sweet ol' time".

In the implementations without latching that data on nWE, the data needs to be held static until the DMA is completed which will be at best about 5 to 8 BUS_CLOCK cycles. (Assuming the DMA is not preempted by something of higher-priority.)

You indicated that you are emulating an old display interface but you don't have the specs and don't know the old display's hold-time requirement.  It's possible the device talking to the display holds the data for a significant amount of time after releasing the nWE signal ... or not.

A quick check of the PSoC5LP TRM shows that the Status registers can be allocated contiguously for 16-bit reads.  Also I believe the FIFOin /odissey1 mentions uses the UDB block FIFO functions which are latching.   I think on 16-bit FIFOin can replace the SRs I mentioned. (Probably more resource efficient too.)

 

Len
"Engineering is an Art. The Art of Compromise."
0 Likes

@Len,

Yes, your schematic is exactly what I was describing, latching the data in Status registers using "Sticky".  I knew how to do this, the trick is constraining placement to allow a 16 bit DMA.

With the info from @RodolfoGL  on how to ensure these two SRs occupy adjacent memory locations, it should be possible to get the DMA trasferring from them in a single 16 bit operation.

You have supplied half the solution each!

 

@odissey1 Thaks for the link to the component. I think it's overkill for what I need, but I may have a peek inside for interest!

Thanks everyone!

 

0 Likes
Mikers
Level 1
Level 1
First solution authored First reply posted First question asked

I have found this:

https://community.infineon.com/t5/PSoC-5-3-1/Difficulties-using-the-parallel-input-bus-PI-in-UDBs/m-...

Which looks like it does exactly what I need. However, the link to the component appears to have been broken in the move from Cypress to Infineon.

0 Likes

Mikers,

The link works for me. ???

Len
"Engineering is an Art. The Art of Compromise."
0 Likes
lock attach
Attachments are accessible only for community members.

Mikers,

The Brad's blog is no longer exist, but updated version of the component can be found here:

ADC_SAR - Filter - VDAC streaming demo using DMA 

odissey1_0-1645503088159.png

 

The FIFOin has advanced options like data ready output pin on FIFO half-full, which allows for block-reading of 4x16-bit FIFO in a single swipe. The FIFOin_ex library is attached.

/odissey1

RodolfoGL
Employee
Employee
250 solutions authored 250 sign-ins 5 comments on KBA

If you want to do a single transfer (a 16-bit read) that contains the 9 bits, I can think of two ways to do it:

1) Use a 16-bit datapath configured as parallel input. 

2) Use two status registers and force their placement to be side by side in memory.

For item (2), you can refer to this post to see how this is possible:

https://community.infineon.com/t5/PSoC-5-3-1/Can-be-expanded-control-register-component-to-16bit-or-...

0 Likes
lock attach
Attachments are accessible only for community members.

Mikers,

Attached is a demo project showing 16-bit bus sampled by FIFOin and transferred to RAM buffer by DMA. It uses updated component by Brad Budlong FIFOin_ex.

The FIFOin_ex library is attached. The difference with original component is that it combines two 8-bit inputs into a single 16-bit bus. The FIFOin_ex library must be added to Project->Dependencies->Add User dependency...

/odissey1

Figure 1. Project schematic.

Creg32-FIFOIn_ex-DMA-RAM_01a_A.png

Figure 2. RAM content output using UART and Terminal

Creg32-FIFOIn_ex-DMA-RAM_01a_UART.png

0 Likes
Mikers
Level 1
Level 1
First solution authored First reply posted First question asked

Here's the full solution:

Schem.png

Yeah, I fibbed. The display has 2 Chip selects. It looks like each CS drives 1/2 of the display. I need to capture the two streams of data from the bus.

UDBClkEn_1 passes WE_L as a clock to SR1/SR2 whenever CS1_L or CS2_L are active.

SR1 and SR 2 are both set to sticky.

There are two DMAs. Both point to the same source address ( the 16 bit SR), but each DMA feeds a different destination address buffer.

The DMAs are set to trigger on a rising edge.  The case I want to trigger on is WE_L transition from low to high, whilst CS is low.

The DMA will also trigger if CS_L transitions from low to high whilst WE_L is low or if both CS_L and WE_L transition low to high (in the same BUS_CLK). But since I've already acsertained that this doesn't happen, I can ignore it.

 

In directives, I fixed the UDB locations for SR_1 & SR_2 as @RodolfoGL suggested:

directives.png

I built and checked cy_fitter.h:

 

/* SR_1 */
#define SR_1_sts_sts_reg__STATUS_REG CYREG_B0_UDB00_ST

/* SR_2 */
#define SR_2_sts_sts_reg__STATUS_REG CYREG_B0_UDB01_ST

 

& sure enough in the catchily named cydevice_trm.h:

 

#define CYREG_B0_UDB00_ST 0x40006460u
#define CYREG_B0_UDB01_ST 0x40006461u

 

Finally, here is the config for DMA_CS1:

    uint16 *Buffer_CS1[100]; 
    /* Defines for DMA_CS1 */
    #define DMA_CS1_BYTES_PER_BURST 2
    #define DMA_CS1_REQUEST_PER_BURST 1
    #define DMA_CS1_SRC_BASE HI16((uint32)CYREG_B0_UDB00_ST)
    #define DMA_CS1_DST_BASE (CYDEV_SRAM_BASE)

    /* Variable declarations for DMA_CS1 */
    /* Move these variable declarations to the top of the function */
    uint8 DMA_CS1_Chan;
    uint8 DMA_CS1_TD[1];

    /* DMA Configuration for DMA_CS1 */
    DMA_CS1_Chan = DMA_CS1_DmaInitialize(DMA_CS1_BYTES_PER_BURST, DMA_CS1_REQUEST_PER_BURST, HI16(DMA_CS1_SRC_BASE), HI16(DMA_CS1_DST_BASE));
    DMA_CS1_TD[0] = CyDmaTdAllocate();
    CyDmaTdSetConfiguration(DMA_CS1_TD[0], 100, CY_DMA_DISABLE_TD, CY_DMA_TD_INC_DST_ADR);
    CyDmaTdSetAddress(DMA_CS1_TD[0], LO16((uint32)CYREG_B0_UDB00_ST), LO16((uint32)Buffer_CS1));
    CyDmaChSetInitialTd(DMA_CS1_Chan, DMA_CS1_TD[0]);
    CyDmaChEnable(DMA_CS1_Chan, 1);

I think that's all I need to do to get a 16bit DMA. The source address is word aligned, the addresses used have >16 bit spokes & the bytes per transfer is set to 2.

I'll try it out when I'm next in the office & update the thread.

0 Likes

Mikers,

Sorry for late reply. I posted a custom 16-bit Status and Control Register components library, which allows for up to 16-bit access to digital ports. It includes an example project showing 16-bit DMA transfer from StatusReg16 to the RAM buffer.

ControlReg16, StatusReg16: 16-bit control and status register components

I believe that a single 16-bit DMA transfer should be more efficient than two chained 8-bit due to the setup time saving. 

 

Figure 1. Example of the 16-bit DMA transfer from the StatusReg16 to the RAM Buffer. ControlReg_1 serves as a "data source" for the StatusReg_1.

Sreg16-DMA-RAM_01b_A.png