cancel
Showing results for 
Search instead for 
Did you mean: 

PSoC 4

Anonymous
Not applicable

Ok so I'm on the CY8CKIT-042 pioneer kit. 

   

I have 5 output pins in the schematic set to Port0[4..0], and I have a Control Reg(toCpldController) connected to the pins. The http://www.cypress.com/?docID=49248 document shows that I shouldn't need it, but if I don't have it the project won't build.

   

They are set to strong drive, as the pin is connected to an input pin on a CPLD.

   

Trying to control the pins in software like this

   

toCpldController_Write(8u);
toCpldController_Write(0u);

   

Gives me a nice up then down pulse that is about 780~800ns long.

   

If I do

   

CY_SET_REG32(CYREG_PRT0_DR,8);
asm("nop");
CY_SET_REG32(CYREG_PRT0_DR,0);

   

I get a pulse for 20ns which seems impossibly short for a 48mhz cpu especially given a nop op, but then it holds low for 5~8us then start to glitch, as if it is floating. With the Controller methods above I don't see that happen.

   

What am I missing?

0 Likes
28 Replies
Bob_Marlowe
Expert II

When you connect a control register to a pins compunent it won't work properly when you try to set the oututs with a direct write to the port. Can you post your complete project, so that we all can have a look at all of your settings? To do so, use
Creator->File->Create Workspace Bundle (minimal)
and attach the resulting file.



Bob
 

0 Likes
ETRO_SSN583
Esteemed Contributor

Here is an ap note that discusses fast pin toggle issues, amongst other

   

GPIO stuff.

   

 

   

    

   

          

   

http://www.cypress.com/?rID=93401     AN86439 - PSoC® 4 - Using GPIO Pins

   

 

   

 

   

Regards, Dana.

0 Likes
Anonymous
Not applicable

Hi Bob

   

I have attached the bundle, thanks for having a look.

   

 

   

Hi Dana

   

That is the mythical document that shows I don't need to connect things to pins for it to build.. I hope Bob finds what I have done wrong.

   

 

   

Is it documented on how fast the ARM can pump data to the internal FIFO buffers? Can it push data there at 48Mhz, and then let the hardware push it out at 48Mhz?

0 Likes
Bob_Marlowe
Expert II

OK, back to the drawing-board:

   

You do not need to use a status register to read the state of a pin.

   

You do not need to use a control register to set the state of a pin.

   

You cannot conect a pin to an internal signal and write to the pin by software expecting a predictable result (Who will win, the signal level or the written value???)

   

 

   

The discussion on how fast a pin can be toggled by software has been started often (started with PSoC1) and always ended up in the following conclusion:

   

You can toggle a pin at a comparable low speed as 4MHz (let it be more or less, no matter) but you cannot control the pin since you are using up the whole CPU performance to toggle.

   

Instead, with a simple component which is real hardware inside a PSoC you may toggle a pin at 48MHz and still being able to control start/stop, pulse width, maintainig a serial interface and some ADCs etc while using only a small amount of CPU power.

   

 

   

Yes, I admit, it is tempting to write a pin-toggle program to test the performance of an MCU, But, you normally do not compare the performance of an Indy-Car to a pickup by comparing their load capacity.

   

 

   

I would suggest you to try solving some real-world problems as a reflex light barrier, sensing or regulating and other projects you usually find in the world of embedded microsystems. Then you will see what a PSoC can perform.

   

 

   

Happy designing

   

Bob

0 Likes
Bob_Marlowe
Expert II

Some more info:

   

When you delete your control- and status-registers and the connecting wires

   

and check off the "Hardware Connection" for your remaining two pin-components you may use

   

fromCPLD_Read() to get both input pins values at the same time in the lower two bits of the result

   

toCPLD_Write() to simultanously write to all 5 pins a (different) value.

   

 

   

There is no DMA on your PSoC4

   

When you need DMA, you will have to purchase a CY8CKIT-044 with a PSoC4 -M. That chip has got DMA

   

 

   

Bob

0 Likes
ETRO_SSN583
Esteemed Contributor

That is the mythical document that shows I don't need to connect things to pins for it to build.. I hope Bob finds what I have done wrong.

   

 

   

Each pin has an internal register, a bit in its respective port, the xxx_DR register, associated with it.

   

So no you do not need a control reg to drive it.

   

 

   

   

 

   

Is it documented on how fast the ARM can pump data to the internal FIFO buffers? Can it push data there at 48Mhz, and then let the hardware push it out at 48Mhz?

   

 

   

In the ap note direct write to the register maxes out at a pin toggle rate of ~ 300 Khz.

   

You can always do a quick check on any device by looking at the number of cycles

   

to do two ASM mov immeadiates coupled with a near jump, that will reflect code driven

   

toggle rates.

   

 

   

Regards, Dana.

0 Likes
Anonymous
Not applicable

check off the "Hardware Connection"

   

That is what I was missing 🙂 take that off and I don't need the other things.

   

I would suggest you to try solving some real-world problems

   

I am working on a real world problem. The code is not just toggle some pins to blink a led super fast, it is only pin toggles for now, as baby steps.

   

Step 1. Since the arm can not trap a 8Mhz pulse does my Async SetReset in the CPLD work. Hence the wait for a pin  to go high from the CPLD ( the call to action ) then the toggle of another pin ( tell the CPLD I got it, and to hence clear its SR ).

   

Step 2. Can I react in time and calc an address, push it to the arm, in 1000ns?

   

Step 3. Can I do that 5 times in a row and inc the address each time

   

Step 4. Can I react in time and calc an address, and a data value, push it to the arm, in 1000ns?

   

Step 5. Can I do that 11 times in a row and inc the address and pull a different byte each time

   

Step 6. Or can I react in time, calc and address, then wait for the Done, then pull the data value from the CPLD, react to it in time to give the next address?

   

..... I like to take small measured steps

   

By tweaking the optimisation levels, I have managed to win the fight with the compiler I think ( Why does it in release with optimisation set to high, and I declare the register to be 'register rg32' does it save the value on the stack to read again the next clock ) I have been able to get it to react to the CPLD_request line in ~240ns then do a pulse for ~100ns

   

The LDR/SDR instruction is 2 clocks, at 48mhz that is ~41ns so there are wait states in the bus to delay it to 100ns? Do you have to wait that long or if you put other instructions that are reg/reg or reg/# in between will they execute instead of the CPU stalling?

   

When you need DMA, you will have to purchase a CY8CKIT-044 with a PSoC4 -M. That chip has got DMA

   

I can't find a tech spec for the DMA module, I searched for PSoC4-M TRM and the site only returned the normal PSoC4 one, I did find a family guide which showed the DMA on the block diagram, but no tech specs. I searched for app notes on DMA got nothing, do you know of a document that details its workings?

   

 

   

In the ap note direct write to the register maxes out at a pin toggle rate of ~ 300 Khz.

   

Is that what the osciliscope pictures show? Not having one I'm not very good at reading them.

   

 

   

Thanks for taking time to help me, I really appreciate it, especially on your weekends.

0 Likes
Bob_Marlowe
Expert II

The PSoC 4 might not be as fast as you need it to be...

   

But wait..

   

There are 4 UDBs within your PSoC with a DataPath object containing two FIFOs and a programmable ALU, a counter some PLD and all is running easily at 24MHz. So the question still remains: Can you do your problem in hardware?

   

What do you mean with "Calculatinng an address"?

   

All your requirements seem not yet quite clear to me.

   

 

   

Bob

0 Likes
Anonymous
Not applicable

so given the following steps

   

put lower 8 bits of address onto bus

   

tell cpld it is there

   

tell cpld that the upper 8bits of address are on the bus

   

put upper 8 bits of address onto bus

   

tell cpld that the 8 bits of data are on the bus

   

put 8bits of data onto bus

   

tell cpld done

   

Putting the tell cpld that X is there into a counter on the PLD parts could be done and save ~300ns I think. This is why I wanted to know if the CPU to FIFO was faster as it could put the data into the FIFO and let the CPLD pulse the ARM FIFO to get the data leaving the CPU free.

   

 

   

What does this need to do...

   

There are 3 components a Computer a CPLD ( to handle the bus marshaling ) and the PSoC

   

1.) 8 way screen scrolling

   

so you have a 40x25 char screen = 1000 bytes

   

to scroll it left you need to do

   

screenBase = screenBase + 1

   

screenBase +1 = screenBase + 2

   

...

   

screenBase + 38 = screenBase + 39

   

screenBase + 40 = screenBase + 41

   

....

   

to scroll down you need to do

   

screenBase + 960 = screenBase + 920

   

screenBase + 961 = screenBase + 921

   

....

   

ScreenBase + 880 = screenBase + 920

   

...

   

to scroll up left

   

screenBase = screenBase + 41

   

screenBase + 41 = screenBase + 82

   

... being diagonal the number of shifts goes for 0-46 back to 0

   

So while that can be done with hardware counters to cover all 8 directions is a lot of silicon, in a CPU - trivial.

   

Then you have to do the same with colour memory.

   

You also have to be able to tell the ARM where screenBase is

   

so you make a packet in memory like so

   

c000 : 00 04

   

Then the computer tells the ARM that it has something at C000 for it, and that it wants it to do task 1 ( where 1 is set Screen Base )

   

The CPLD will load C000 into its address buffer and then pass the task number to the ARM, the ARM then see that it is task1, it then instructs the CPLD to read the first byte at that address, and give it to it, it stores in, inc the address buffer in the CPLD and says get the next byte, cpld gets the byte and then gives it to the ARM, arm stores that in its RAM, and tells the CPLD to give the computer control again.

   

Then later the computer will signal the CPLD with another task 2 ( scroll screen left 1 char ). The CPLD tells the ARM task 2, the ARM tells the CPLD to take control, and then gives it the first address from above, and to read that byte. Then the 2nd address, tells it to write the byte it just read... until all 1000 are done, then changes the address to CRAM, does the 1000, then tells the CPLD to give control back to the host.

   

 

   

Once the ARM decides to take control of the computer it has to wait ~3000ns till it can take over, so there is plenty of set up time for its interal state, switching on task etc

   

 

   

Sprite Multiplexing

   

each sprite needs x,y,colour,image and then there is 1 byte for all 8 on enable, x expand, y expand, priority and xmsb.

   

Say you want 32 sprites. so the ARM needs to pull in 32xs, 32ys, 32colours, 32 images, 32 misc bits- ofcause you will need a task that tells the ARM where the data is kept in ram, and another task that tells the arm you have changed it, and to grab again. Once it has grabbed the data it will give controll back to the computer.

   

The ARM will then Y sort the 32 sprites, and work out the grouping needed to set the computers registers, it can have 8 on a line and only has 8 sets of registers for sprites. It will then arange the data to the needed format and build an internal raster line table of what raster lines it needs to write that data on. 

   

The CPLD will keep track of cycles the computer has, and thus be able to guess when the raster points are going to occur. some time 4~5 computer clocks before the Raster line is about to hit, the ARM will take control via the CPLD, tell it to get the value of $d012 and poll it till it hits the right value. Once it hits the right value it will write the 8 sprite data to the right places in memory, tell the CPLD to let the host have control and then tell the CPLD the next raster line it wants ( well raster line - a few cycles )

   

Some models of the computer have 65 cycles per line, some 64, others 63, so the ARM will need a task that lets it query, or time the machine to work out which one it is.

   

Flip sprite data.

   

Sprites are 63 bytes of data and the colours are 2 bits wide so flipping them is not trivial. Tell the ARM where it is in RAM , it can pull in the 63 bytes, then as it reads flips the bits the right way , then write back the 63 bytes.

   

Blocks moves, decompression algortims, write sample data to the sound chip... lots of other things

   

 

   

The computer lets you have half of a 1Mhz bus. So while th Clock is 1000ns you can only have 500ns, so effectivly 2Mhz. ( well PAL machines give you 1015ns NTSC machines give you 977ns )

   

At the start of the 500ns you need to set the address valid, then at the end you either set the data or read that data depending on how you set the R/W line. So this gives a 500ns window to get the Address to the CPLD, I then have 70ns after that or so to get the R/W line state as well then I have 400ns to get the data to the CPLD if I need to write data. Once the data is to the CPLD I have until the 1000ns is up to prepare for the next round.

   

 

   

So basically I need the full power of a CPU, I need at least 2K RAM, and flash/prom to store the program in. Having extra PLD and the ALU units is a nice bonus, I hope I can put the Cycles counters to track the raster line and position into them.

0 Likes
Bob_Marlowe
Expert II

Quite a complex project, I whish you good luck!

   

 

   

Bob

0 Likes
ETRO_SSN583
Esteemed Contributor

In the ap note direct write to the register maxes out at a pin toggle rate of ~ 300 Khz.

   

Is that what the oscilloscope pictures show? Not having one I'm not very good at reading them.

   

 

   

Indeed that is what the scope traces show.

   

 

   

Looking at the pin architecture one can see that the only thing direct to

   

the pin is the analog buss.....hmmm makes one wonder if a mux out to the

   

pin with two states in its inputs, a 0 or a 1, would toggle the pin faster. An

   

experiment I will try.

   

 

   

Or this -

   

 

   

   

 

   

Regards, Dana.

0 Likes
ETRO_SSN583
Esteemed Contributor

Using below I got a pretty clean 24 Mhz pin toggling out of the

   

part. Control reg to disable toggling, or connect HW to D reset

   

to effect control.

   

 

   

Regards, Dana.

   

 

   

0 Likes
Anonymous
Not applicable

Quite a complex project, I whish you good luck!

   

Not really, once the system is set up to DMA To and From the computer its academic. writing C code to do the sorting of sprites is 1hr tops. Even a bubble sort will easily out perform what the computer would do.

   

The computer has a 7.9~8.1Mhz dot clock which the CPLD uses to time its parts, i.e. it breaks the 500ns active time into 4 smaller clock cycles, so controlling the bus is just a simple FSM.

   

Turns out I didn't pick my battle difficultly well though. So far the CPLD stuff has been mostly easy compared the PSoC stuff. I figured with a 48Mhz CPU I would be drowning in clock cycles, that I would be wasting it in long nop loops, that I would be writing C in any old fashion not really caring what the compiler did with it. Not reading the dissasembly and tweaking every compile option and coding in a certian style to fox it into cycle exact coding.

   

Do cypress have any barebones higher clock CPU+RAM+ROM combos that are cheap?

   

For those who come after

   

if( CY_GET_REG32(CYREG_PRT1_PS) != 0 ) break;

   

is not the same as this

   

register r32 var = CY_GET_REG32(CYREG_PRT1_PS);

   

if( var != 0 ) break;

   

The first one is about 8x faster.

0 Likes
Bob_Marlowe
Expert II

Does it matter when you leave out the "!=0" part?

   

There are PSoCs quite faster than a PSoC4 (I thought you would never ask )

   

The PSoC5 with an M3-core comes with 68MHz and there are 80MHz versions availlable. Moreover there is the CY8CKIT-050 as a versatile development kit and lastly the CY8CKIT-059 prototyping board with snap-off kit-programmer. The advantage of the PSoC5 besides its speed are the 24 UDBs (compared to 4 with a PSoC4). You may easily build your 8-bit address/data interface using 3 UDBs, pushing the data into a FIFO, getting interrupted.... Even the adding of the displacement values you could do within a single UDB clock cycle.

   

 

   

Bob

0 Likes
ETRO_SSN583
Esteemed Contributor

if( CY_GET_REG32(CYREG_PRT1_PS) != 0 ) break;

   

is not the same as this

   

register r32 var = CY_GET_REG32(CYREG_PRT1_PS);

   

if( var != 0 ) break;

   

The first one is about 8x faster.

   

 

   

It would be instructive to look at .lst file for the compiler generated code

   

differences. I had issues in another processor and found that any non

   

native size type was best handled with pointers for speed / code size reasons.

   

Was able to get a lot of code reduction thruout code by doing that.

   

 

   

Regards, Dana.

0 Likes
ETRO_SSN583
Esteemed Contributor

The 80 Mhz parts -

   

 

   

0 Likes
Anonymous
Not applicable

But do the 80mhz chips have faster GPIO speeds or does the CPU just stall for longer? The docs show it has a period of 500+ns on just a simple toggle.

   

The DMA controller might be able to help, but it might take to long to set up as it can't see the registers, so the register state will need to be written to RAM( does it have delay states? ) then DMAd to pins. What clock speed does it run at?

   

 

   

The issue was not the != 0 but the fact there are 2 lines. I will pull the disassembles for you tommorow.

0 Likes
Bob_Marlowe
Expert II

For speeds see AC specs in datasheet for CY8C58xxx .

   

No need to copy the ASMs for your given code, I saw that.

   

 

   

Bob

0 Likes
ETRO_SSN583
Esteemed Contributor

But do the 80mhz chips have faster GPIO speeds or does the CPU just stall for longer?

   

 

   

The specs show no internal settings to stall for faster speeds, like some CPUs in the

   

past, largely FLASH limitations. So toggle rate under SW control would be faster.

   

 

   

You can consult Architecture TRM for DMA timing and cycles and latency. Not a simple number

   

due to arbitration, interrupts, etc.. That being said buss speeds do rise with buss clock rate.

   

Also AN84810 have a disucssion on timing.

   

    

   

         

   

 

   

http://www.cypress.com/?rID=37793     AN52705     Getting Started with DMA

   

http://www.cypress.com/?rID=82680     AN84810     PSoC® 3 and PSoC 5LP Advanced DMA Topics

   

http://www.cypress.com/?rID=44335     AN61102 PSoC® 3 and PSoC 5LP - ADC Data Buffering Using DMA

   

http://video.cypress.com/video-library/search/dma/     Videos on DMA

   

https://www.youtube.com/results?search_query=dma+psoc Videos on DMA (some overlap)

   

 

   

Of course you can set up quickly some very simple tests, PSOC is its own test bed in a sense, to

   

measure thruput.

   

 

   

 

   

Regards, Dana.

0 Likes
ETRO_SSN583
Esteemed Contributor

Here is an example of putting a HFCLK driectly out to pin, 48 Mhz.

   

Not the prettiest looking waveform, but its there.

   

 

   

   

 

   

Regards, Dana.

   

 

   

0 Likes
Anonymous
Not applicable

The PSoC 5 documentation doesn't directly reference wait states, but given the first 32K of SRAM can be used to run code, so it can run at full speed, it does imply that if the code is not in SRAM it runs slower. Seeing as its clock is 80Mhz, maybe Flash just can't handle it.

   

Could the missing 60ns above be due to FLASH? As Dana has proven the Pins can run at 48Mhz, we assume the CPU core is running at 48Mhz, two stores to the same location with immediate values shouldn't cause any pipeline stalls, read back stalls or pipeline flushes, there is no cache to flush. Is the instruction fetch costing us the 60ns. I tried to make a PWM timer module to fire an interrupt to time some code but I got this helpful error msg

   

Error: mpr.M0139: Invalid connection for clock input "\Timer_1:cy_m0s8_tcpwm_1\:clock" driven from "ClockBlock:dsi_in_0". The component requires a clock from the clock block. (App=cydsfit)

   

"ClockBlock:dsi_in_0" is not from the Clock Block?

   

 

   

Seeing as the PSoC5s are $15 I guess there is one way to find out 😉 My usual supplier doesn't stock the PSoC 5 kits so I  have started looking for another, that doesn't kill me on postage. Its stringent power cleanliness might make it a no go though, also it seems like it can't/will be difficult to use on a 2 layer PCB.

0 Likes
Anonymous
Not applicable

Ok

   

CY8CKIT-059 PSOC 5LP kit

   

Time to detect and respond to a pulse 240ns - same as the PSOC4

   

Pulse length 60ns - 40ns faster than PSOC4

   

 

   

This is with the clocks set to 74.7MHz. I have not looked at how to run code in the SRAM yet, maybe that will give me speed over the PSOC4?

   

Also I must have the buffer checked on input otherwise the code won't detech the input pin state at all, could this buffer explain why it is no faster?

0 Likes
ETRO_SSN583
Esteemed Contributor

Ram vs Flash execution (Cortex M3) -

   

 

   

www.cypress.com/

   

 

   

 

   

 

   

Regards, Dana.

0 Likes
ETRO_SSN583
Esteemed Contributor

Post your test project so forum can take a look at it.

   

 

   

    

   

          

   

“File”                                                             Creator

   

“Create Workspace Bundle”

   

 

   

 

   

Regards, Dana.

0 Likes
Anonymous
Not applicable

30% boost at low speed, that is a fair boost 🙂

   

 

   

Here is the project archive.

0 Likes
ETRO_SSN583
Esteemed Contributor

There is a section in here for fast toggling -

   

 

   

    

   

          

   

http://www.cypress.com/?rID=57571     AN72382 - Using PSoC® 3 and PSoC 5LP GPIO Pins

   

 

   

 

   

Here is the code your program produces, from .lst file

   

 

   

  23:.\main.c      ****         while(1)
  24:.\main.c      ****         {
  25:.\main.c      ****           if( (CY_GET_REG32(CYREG_PRT3_PS) & 1u )!= 0u ) break;
  42                      .loc 1 25 0
  43 000a 074B             ldr    r3, .L6
  44 000c 1B68             ldr    r3, [r3]
  45 000e 03F00103         and    r3, r3, #1
  46 0012 002B             cmp    r3, #0
  47 0014 07D0             beq    .L2
  48                      .loc 1 25 0 is_stmt 0 discriminator 1
  49 0016 00BF             nop
  26:.\main.c      ****           //  if( Request_ARM_Read() != 0 ) break;
  27:.\main.c      ****         }
  28:.\main.c      ****         CY_SET_REG32(CYREG_PRT0_DR,1);
  50                      .loc 1 28 0 is_stmt 1 discriminator 1
  51 0018 044B             ldr    r3, .L6+4
  52 001a 0122             movs    r2, #1
  53 001c 1A60             str    r2, [r3]
  29:.\main.c      ****         CY_SET_REG32(CYREG_PRT0_DR,0); //pulse the done pin
  54                      .loc 1 29 0 discriminator 1
  55 001e 034B             ldr    r3, .L6+4
  56 0020 0022             movs    r2, #0
  57 0022 1A60             str    r2, [r3]
  30:.\main.c      ****     }
  58                      .loc 1 30 0 discriminator 1
  59 0024 EEE7             b    .L5
 

   

Regards, Dana.

0 Likes
Anonymous
Not applicable

Ok so I put the code into SRAM using the __attribute__ ((section(".data"))); and I changed the optimsation settings to be Area which gives this asm

   
143                  .LVL0:  144                  .L3:   32:.\main.c      ****           if( (CY_GET_REG32(CYREG_PRT3_PS) & 1u )!= 0u ) break;  145                      .loc 1 32 0  146 000c 2368             ldr    r3, [r4]  147 000e DB07             lsls    r3, r3, #31  148 0010 FCD5             bpl    .L3  149                      .loc 1 35 0  150 0012 0122             movs    r2, #1   36:.\main.c      ****         CY_SET_REG32(CYREG_PRT0_DR,0); //pulse the done pin  151                      .loc 1 36 0  152 0014 0023             movs    r3, #0   35:.\main.c      ****         CY_SET_REG32(CYREG_PRT0_DR,1);  153                      .loc 1 35 0  154 0016 2A60             str    r2, [r5]  155                      .loc 1 36 0  156 0018 2B60             str    r3, [r5]   37:.\main.c      ****     }  157                      .loc 1 37 0  158 001a F4E7             b    .L4
   

 I now get a pulse response of 280ns~325ns but a response pulse of 50ns

0 Likes
Anonymous
Not applicable

Adding this hand asm, not in SRAM 

   
//CYREG_PRT0_DR 0x40005100u         //CYREG_PRT3_PS 0x40005131u         asm(   "ldr    r4, =0x40005131\n\t"                "ldr r5, =0x40005100\n\t"                "movs   r6, #0\n\t"                 "movs   r7, #1\n\t"                 "L3:\n\t"                 "ldr    r3,[r4]\n\t"                 "lsls   r3,r3,#31\n\t"                 "bpl    L3\n\t"                 "str    r7,[r5]\n\t"                 "str    r6,[r5]\n\t"              :             :             : "r3","r4","r5","r6","r7" );
   

 

   

I get the response down to 220ns, pulse 50ns

   

In SRAM

   

respsone is up to 340ns, pulse 40ns

0 Likes