cancel
Showing results for 
Search instead for 
Did you mean: 

PSoC 5, 3 & 1

RiAs_1660756
New Contributor II

I have two questions:

1. Where can I find out timings for reading and writing to ports and registers including DMA transfers?  I have a time-critical application and want to know the numbers.

I ran a simple test to see how fast I could write to a pin using API write function, direct write to the port address, bit-banding; then writing to a control register connected to a pin.  The results were surprising in that the bit-banding took more clock cycles than a direct write; also, the number of clock cycles jumped at 48MHz, from typically 6 to 14. 

Clearly, this isn't a simple issue. 

2. Regarding warnings about asynchronous paths: if I get a warning about an asynchronous input that is used as a clock on a D-type, this means that the D-type is actually clocked on the internal clock edge, not my external clock.  Otherwise it wouldn't matter, no? If I use a Sync component, I'll lose another clock cycle: one to get through the Sync and the next to clock the D? 

I think the Sync component is basically a D flip-flop anyway so, despite the warning, I get no benefit from using it and delay my signal unnecessarily.

Am I wrong here?

0 Likes
13 Replies
odissey1
Honored Contributor II

RiAs,

1. Pin i/o timing considerations was discussed before:

Macros for bit band access? 

And the fastest way to set GPIO on PSoC5 is:

while (1)

{

Testpin_DR = 1 << Testpin_SHIFT;  

Testpin_DR = 0 << Testpin_SHIFT;

Setting pin High or Low takes only one clock and 2 clocks go to the "while" loop. Attached screenshot at 24MHz blue - BUS_CLK, yellow - Pin toggle)

IMG_20180407_134859.jpg

 

Using oscilloscope and a spare pin, you can check time of DMA transfer by measuring delay between the "drq" and "nrq"  signals. Typically, it takes about 8-12 BUS_CLK to complete a single 8-bit DMA transfer. DMA datasheet has more detailed info on that.

See also:

C++ vs Assembly vs Verilog. on Vimeo

Time to process a command in PSoc 5LP 

PSoC 4 - Study of pin toggle max speed,  comparison of several ways of toggle. 

/odissey1

P.S. I suggest placing the second (unrelated) question into a separate thread

0 Likes
RiAs_1660756
New Contributor II

Hi, odyssey1,

Thank you for taking the time to reply and for the links to resources. 

Prior to my question, I tried: direct pin access and pin driven by control register.  For each case, API, register access (same as your code snippet) and bit-banding.

What I found was that above 48MHz, more clock cycles were needed.  This must be a pipelining effect as, using your own example, above 48MHz, we see 3+3 instead of 1+3.  This means that a faster clock might actually slow the critical bits of my code. 

Richard.

 

0 Likes
Len_CONSULTRON
Honored Contributor II

Richard,

Question 1.

All CPU resources have their own access timing inside the internal buss structure.

The fastest tend to be FLASH and SRAM.  This is because they do nothing but allow for Reads or Writes.

The Register resources may look like SRAM from the CPU's perspective but may require additional CPU cycles to perform the required function.

EEPROM is the worst for extended timing.

Once the CPU frequency goes above the maximum access timing, additional CPU cycles are needed to complete the operation.  This is why you saw more CPU cycles when going above 48MHz with the register access.   In this case since the CPU cycles to access are more, it can take a little longer to access the resource.

Performing a "smarter" more efficient register access like odissey1 provided can help.

The API calls to certain component resources are great and have a certain level of code abstraction.  However, the price of code abstraction can burden the execution time to make sure the API call can operate across many platforms.

DMA access with resources are the same as CPU access.  The DMA HW is designed to know the resource timing limitation based on the DMA input clock and adjust accordingly.

Suggestion:

You are using a PSoC5 which has what I think is a great resource: UDBs.

Is it possible in your application to create a UDB-base HW state machine to alter the registers?

I found that if you can create a HW state machine to do the much of the time-critical functions, you can unburden the CPU and the DMA from swiping clock cycles and perform the needed operations with the lowest latency.

Question 2.

The warning about the "Asynchronous path" is just that:  A warning.

Creator doesn't prevent an "Application Build" phase.  It just warns you of a potential timing violation that might be an issue with your design intent.

It is up to you to determine if the timing violation (usually a input setup timing to the input clock) will be a problem.   For example, the UART component never provides a asynchronous warning although the Rx input is guaranteed to be asynchronous to the UART clock.

The major problem for asynch inputs to the clock to a latching component is called a "Meta-stable" condition.  If the input switches within the setup or hold time requirements of the latch to the clock, then the output could actually oscillate for a period of time (usually for a few nsecs).   If the downstream logic being feed by this output will be adversely affected by an oscillation, then would be a design problem.

I experienced a "meta-stable" timing condition only once in my early engineering career.  The event occurred once every one to two days.  It took two weeks to isolate and capture the issue in testing.   Luckily, this occurred before production where it could be corrected before perpetuating the issue to the customer.

The usual fix to have the input double clocked through two serial DFFs.  The first DFF allows a potential output oscillation.  By the time the second DFF sees the first DFF output, the oscillation should be stable before the setup time.  Therefore the second DFF output should be free of the oscillation.   Sadly, there is at least one clock latency.

Len
"Engineering is an Art. The Art of Compromise."
0 Likes
RiAs_1660756
New Contributor II

Hi, Len,

I am very grateful to you and to odyssey1  for taking the time and trouble to reply to my query. 

I have a box elevated to 60,000V containing a PSoC5LP that incorporates a fast SPI implemented in UDBs that takes four 24-bit codes (8 command, 16 ADC data) and squirts them through a fibre-optic link (3 fibres) to a receiver which is a AD5676 8-channel high-speed DAC.  This then forms the control signal to several analogue control loops that send signals through transformers up to the HV end.  The whole thing works great.

At the bottom end, there is a second PSoC5LP that squats on the three signals and, on a cue, monitors the signals received until it has the full set which it then stores in SRAM and transmits to the user through an external interface.  This uses the FIFO in the datapath and can deliver four blocks at a time.  This information is relayed on the screen.  However, the digits frequently blink incorrect values.

This was terrible at first and I took it to be that the data wasn't being retrieved before being overwritten.  So I put two on the schematic and toggled between them and this worked well enough to use.  I am sure the problems are either: 1) related to the latency in getting the data out or 2) clocking issues with the incoming asynchronous SPI clock.

What I really would like is to be able to DMA to a SRAM address based on the code part of the received data but I think that might be tricky if not impossible.  I might be better off using a chain of D-types, LUTs and status registers.  The LUT could trigger one of several independent DMAs per the control code received.

In experimenting (per odyssey1's suggestion) with DMA timing, I notice that 8 and 16-bit transfers from datapath components take the same time.  Is it therefore possible to constrain two adjacent status registers and read them as a 16-bit half-word?  The PHUB spoke width is 16 bits, is it not?

I just love PSoC - it's like having a bottomless toybox!

Thanks again,

Richard.

 

0 Likes
odissey1
Honored Contributor II

Richard,

The 60 kV  sounds like an X-ray tube power supply..

If you need a 16-bit Status Register for reading a digital bus with DMA, then using a FIFOin custom component (by Brad Budlong) is a solution. The Brad's blog is no longer exist, but updated version of the component can be found here:

ADC_SAR - Filter - VDAC streaming demo using DMA 

SAR-Filter-VDAC_signed_FIFO_02a_A.png

You only need a FIFOin_ex and maybe DMA configuration example. The FIFOin has advanced options like data ready output pin on FIFO half-full, which allows for block-reading of 4x16-bit FIFO in a single swipe. The FIFOin_ex library is attached.

    Hope that using FIFOin can help, but my bet is on the asynchronous SPI errors. Is there any way to distibute the SPI clock? 

/odissey1

PS. I believe that PHUB spoke for perifery-RAM DMA is 16-bit.

PPS. Though it is possible to modify DMA address using another (chained/nested?) DMA, it is cumbersome. Since you have only few destinations, individual DMAs with fixed destination is more straightforward. 

0 Likes
RiAs_1660756
New Contributor II

Hi, odissey1,

Once again, thank you for your interest and taking the time and trouble to assist.  I appreciate it greatly.

I'll chew my way through the information you provided and let you know if the problems was resolved. 

You are nearly correct in that it is an application similar to x-ray with floating supplies that need regulating from the earthy end.

Regards,

Richard.

0 Likes
odissey1
Honored Contributor II

Richard,

Attached is an example project showing usage of FIFOIn_ex component for DMA transfer from the 16-bit hardware bus to the RAM buffer.

       In this demo, the FIFOIn_ex is configured for 16-bit wide input, with HalfFullTrigger parameter set to TRUE. When data populates at least 1/2 of the FIFO, the "drq" output goes HIGH, forcing DMA transfer into RAM buffer until FIFO is empty. Once finished, the result is printed using UART terminal.

     Note, that this FIFOIn_ex usage differs from the above ADC_SAR-FIFO demo, where it was utilized as a simple wide status register (HalfFullTrigger=FALSE). 

/odissey1

Project uses several custom components:

* FIFOIn_ex - library is attached.  It must be added to the Project Dependencies.

* ControlReg32 - included in the project.

 

Figure 1. Project schematic. FIFOIn_ex reads 16-bit hardware bus into 4x deep FIFO. DMA transfers data to RAM until FIFO is empty. ControlReg32 provides 16-bit test data on the Clock_STRB. 

Creg32-FIFOIn_ex-DMA-RAM_01a_A.png

Figure 2. RAM Buffer content once all test data has been transferred.

Creg32-FIFOIn_ex-DMA-RAM_01a_UART.png

 

0 Likes
Len_CONSULTRON
Honored Contributor II

Richard,

It sounds like you have a very sophisticated system!

I'm wondering if most of the problem you're having is that you're waiting for a "full set" to arrive before emptying the 4-element SPI FIFO.  It might be probable that you're incoming data rate is fast enough that occasionally you're not able to pull that last data from the set out before the FIFO gets overrun.  Hence the possible "wrong value".

Suggestion:  Offload the data from the SPI as you get it.  Don't wait for the entire set to come in. Get the add in ASAP and sort out data set alignment later.   Using DMA for this is a good idea.

Let the FIFO do its job which is be safeguard in the case that the data rate is fast and the CPU may not be able to keep up. 

There are multiple ways to create a 16-bit aligned SR. 

  • Creating a correct directive for the UDB assignment in the DWR/Directives tab.
  • Creating your own custom component using the UDB editor.
  • Creating your own custom component using Verilog syntax.

There might be other ways as well.

Len
"Engineering is an Art. The Art of Compromise."
0 Likes
RiAs_1660756
New Contributor II

Hi, Len,

It isn't really very sophisticated!  It wouldn't be interesting if it were not for the 60kV aspect.  Then it would just be a mixed analogue/digital with a fast SPI link.

Thank you for replying.  Your insights and suggestions are very welcome.  odissey1 pointed me toward a Verilog based FIFO component but my understanding of Verilog is rudimentary at best.  However, I'll do my best to digest it.  I'll have a go at the assignment/directive approach as I'm on firmer ground there and on your second suggestion, I have already created a 'breadboard' UDB/datapath counter in 8,16,24,32 to see the DMA timings. 

I seem to have drifted away from the original question.  As to using the Control Registers in pairs to make a 16-bit register, this works:

#include "project.h"

#define CR0_Control16        (* (reg16 *) CR0_Sync_ctrl_reg__CONTROL_REG )

#define REDLED		0x8000
#define AMBERLED	0x0080

int main(void)
{
    CyGlobalIntEnable; /* Enable global interrupts. */

    for(;;)
    {
		CR0_Control16 = REDLED;				// Red LED on
		CyDelay(200);
		CR0_Control16 |= AMBERLED;			// Amber LED on
		CyDelay(500);
		CR0_Control16 &= ~AMBERLED;			// Amber LED off
		CyDelay(200);
		CR0_Control16 = 0x0000;				// Red LED off
		CyDelay(500);
	}
}

Untitled.jpg

I put CR0 at U(0,0) and CR1 at U(0,1) using directives.

I should have proper hardware available next week to try out several of the ideas you and odyssey1 have shared.  Right now, this is promising:

Untitled2.jpg

If I can do the same trick with SR1&SR2, maybe I can do a 16-bit DMA transfer.  Maybe that's a completely pointless exercise but it is intriguing.

Regards,

Richard.

0 Likes
odissey1
Honored Contributor II

Richard,

Looking on the schematic, isn't it an SPI implemented using discrete DFFs? I wander if a standard SPI can be utilized for better timing instead? The DFFs chain may have large latency, do you receive a timing violation warning? What is clock CK frequency?

       Not sure what is the role of the Sync_1 and UDBClkEn_1. The UDBClkEn_1 here works like a Sync component. I believe that it is sufficient to put pins SDI, SS and CK to Sync mode instead, and the Sync/UDBClkEn can be omitted altogether. 

      I believe that Status Register SR2 represents address, and SR0 and SR1- 16-bit value. If so, the SR0+SR1 can be replaced with a single 16-bit FIFOIn component for 16-bit DMA transfer (unbuffered mode). The FIFOIn can perform same task as 16-bit Status Register. If SR2 is also replaced with 8-bit FIFOIn, then you get 4x deep hardware buffer for both the address and the value, which can resolve syncronization issues. 

0 Likes
RiAs_1660756
New Contributor II

Hello, Len & odyssey1,

I found that the underying problem was related to the synchronising between looking for the data and the data arriving.  The 16-bit serial DAC is being updated at 20-30ksps but the user doesn't need to know that.  So the DMA is enabled during that routine and it waits until it has the full set.

I've tried different ways of solving the problem of the 'glitches' and in the end used the inelegant method of throwing away the first complete set and then collecting another lot.  This second set doesn't glitch. 

This will do for now.  I have the next revision of PCB to check over and I'll incorporate some of the above ideas at the same time. 

One thing I tried with zero success is to use an 8-bit variable and to DMA to each bit for each received channel.  Once the variable reaches 0xFF, I know all 8 words have arrived.  But the bit-band region is above 0x20000000 so I tried using the example in 001-89610_AN89610_PSoC_ARM_Cortex_Code_Optimization.pdf per page 33.  My variable was placed at 0x2000000 but I could not get the bit-banding to work.  The architecture TRM says (p97): "SRAM mapped to the code region is also accessible by DMA in the SRAM bitband region."  I think this is getting a bit beyond the immediate issue: "Does it work?" "Yes" "Then stop messing with it!"

I suppose that if a variable in the code region maps to an address 0x00010000 above it in the SRAM region, I can use that in the macros to generate the bit-band addresses and happily DMA to those.

Another day, perhaps.

Thanks, guys, for your help.  These insights are fascinating.

- Richard.

 

 

0 Likes
Len_CONSULTRON
Honored Contributor II

Richard,

Based on your last post, the problem is you are waiting for the "buffer" to be full before emptying it.  This is generally not a good idea which is markedly true if the data rate is fairly fast.  The issue is risk of buffer overrun.

There are multiple solutions that don't require throwing away data.  

  • Double-buffering (using ping-pong DMA).
  • Pulling and processing the data as it comes in.
  • Oversizing your buffer depth and having a "watermark" interrupt when (MAX_BUFFER - WATERMARK_OFFSET) is reached.  At the watermark interrupt to gather and process the data.

 

Len
"Engineering is an Art. The Art of Compromise."
0 Likes
RiAs_1660756
New Contributor II

Hi, odyssey1,

The clock is 10MHz.  The built-in SPI says a max of 5Mbps.

The circuit above is just a 'breadboard' replacement for the state machine/datapath job that I am using at the moment.  I have had to shave it down a bit to fit in the design along with all the other stuff. 

I put the UDB clock enable in because the compiler said: "Routing of asynchronous signal HJKCS(0):iocell.fb as a clock to UDB component "\SR1:sts:sts_reg\" is not supported unless a UDB Clock/Enable component is used." so I did as I was told!

However, deleting the sync and UDB ck en and syncing the pins does build OK so I'll try it out.

At the moment, I have to solve problems related to sparking.

Regards,

Richard.

 

 

 

 

0 Likes