Strange behavior when trying to implement shared memory through DMA

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
Anonymous
Not applicable

Hi folks,

I'm working on a project to use a PSoC 5 MCU to add extension capabilities to a vintage pocket computer. The pocket computer uses an LH5801 CPU running at 1.3MHz.

I've set up GPIOs to read the 16 bit address bus, read and write the 8 bit data bus, and read the R/W pin. CS for the MCU is by matching 1000 from the upper four bits of address. I have also set up indexed DMA to handle data writes to the MCU memory. Basically, a status register uses some bits from the upper address to get a page offset within a 4K buffer array in MCU memory (4K starting at 0x20002000), and the lower address 8 bits provide the rest of the indexing into the buffer. I grab the buffer address when CS goes high, then read in data when R/W goes low while CS is high.

This mostly works. The problem I'm having is that I can write into MCU memory, with the correct offset into the buffer, and read the data back out, as long as the data value being written doesn't start with 1000. If the data value does start with 1000, it seems like the low pulse on the R/W line is never caught.

This is a bit bizarre for a few reasons:

1. 1000 just happens to be the CS trigger. Coincidence?

2. I have tried implementations without DMA and had exactly the same problem.

3. When I have tried implementations that wrote to a single address in memory, instead of a range of addresses, I had no problem with 1000 values (e.g. 129 or 143).

I'm really pretty stumped now. I would have thought switching to a completely different way of doing things would at least have resulted in different problems.

Any ideas what might be happening or how to further troubleshoot this?


Thanks,
Paul

0 Likes
1 Solution
Anonymous
Not applicable

OK. So, to be clear, the correct answer is that DMA takes long enough to set up that it is nowhere near as fast to transfer a single byte as a CPU-driven for loop. A for loop with PSoC 4 wasn't fast enough for what I wanted to do, but the same one on PSoC 5 with a status register for page decoding, is fast enough to reliably emulate SRAM for a 1.3Mhz 8-bit processor.

View solution in original post

0 Likes
12 Replies
Anonymous
Not applicable

One other thing I noticed from the Pocket PC documentation is that there is an OD (output disable) line, that goes high before write goes low. The documentation isn't very clear, but it sounds like this is to disable other devices from writing to the bus so the CPU can prepare its data to write. I think I'll try hooking that up (inverted) to output enable on my GPIO data bus instead of using R/W. I don't have a great theory on how this would cause the issue I'm seeing, but, if I'm interpreting the use of this signal correctly, this should be the right way to do this piece, regardless... and maybe it will help with the 1000 issue, too.

0 Likes
Anonymous
Not applicable

Well... switching to OD to control the data bus works just as well as before... no improvement for reading the 1000 values.

0 Likes
Anonymous
Not applicable

Another experiment to try to understand this:

I added a second data TD, pointing it to a fixed address. So, the, first TD writes the data bus value to the buffer offset address, and the second one writes to the fixed address. Here, if I write 129, it's picked up by the fixed address, but not the buffer. If I write 12, it's picked up by the buffer but not the fixed address.

So, I guess is must be a timing issue and certain values take longer or less time for the CPU to write... Still seems weird. Any other ideas? Some sort of data latch that could be edge triggered to more quickly grab the data bus on write and hold it for the DMA?

0 Likes
Anonymous
Not applicable

Hmmm... while Googling trying to find more answers, I found this:

"

2. Re: DMA to Control Register Question

user_1377889Level 8

Welcome in the forum, G.

  

Your question why most of the examples are working with arrays is quite easily answered: The setup for "just" a single byte transfer is a large overhead, so a simple assignment would beat the DMA, even an interrupt driven solution would be faster.

  

It is always advisable to post a workspace bundle so that we all can have a look at all of your settings. To do so, use
Creator->File->Create Workspace Bundle (minimal)
and attach the resulting file.

  

  

Bob

"

So, I went back and re-wrote my non-DMA for(;;) loop based interface, and, sure enough, that seems fast enough. Knock on wood...

0 Likes
Anonymous
Not applicable

OK. So, to be clear, the correct answer is that DMA takes long enough to set up that it is nowhere near as fast to transfer a single byte as a CPU-driven for loop. A for loop with PSoC 4 wasn't fast enough for what I wanted to do, but the same one on PSoC 5 with a status register for page decoding, is fast enough to reliably emulate SRAM for a 1.3Mhz 8-bit processor.

0 Likes
Anonymous
Not applicable

Hi,

if I may add my two cents this is more or less ( different retro thing, different story ) one of the things I wanted to do, actually mine was lightly different but likewise got to the conclusion  that PSCO5 as well it's not fast enough for this kind of work takes too many cycles especially for older CPUs that were able to do a mem access in 1 clock cycle.

In the end I resorted to "ExtMemInterface" and a dual port ram, but this has the drawback it requires lot of pins plus you need external ( expensive ) chips, thing is for this kind of application you'd "almost need a chip specifically designed for that", I was too "too optimistic" that 'sure a thing for so many MHZ can take an interrupt and present a byte to a port" .. and turns out "no you can't, it's still too many clock cycles" to even match a 1MHz cpu, even more than that I am not sure.

Said that, I know "some people did it" ( but I do NOT know the details of how ) using "some other CPUs" that for reasons I don't know seems to be able to do that 'easier', but yeah my "look at PSOC5" give me the impression that "as it is, does not look like you can actually do that" unless there is some "extremely clever trick" for doing it I don't know.

0 Likes
Anonymous
Not applicable

For mine, it is working. The last test I did was encoding a machine language (of the host Pocket PC) routine in my PSoC SRAM which copies blocks of memory from source to destination addresses. I used this to copy 512 bytes from Pocket PC ROM to the R/W area of my PSoC interface. This was a bit glitchy at 48MHz, but has proven to be solid at 72MHz. I used it to make 32 passes and write out the whole 16K of ROM to a file on the SD card via EMFile.

Reading the machine language opcodes out of, and, at the same time, writing bytes back in to the PSoC emulation "RAM", with each byte transfer being done with a single opcode call, tells that this should be fast enough for anything I can do in this device. But, having to go from 48MHz to 72MHz to make this work reliably tells me that I'm pretty close to the limit of what can be done with RAM emulation on a PSoC 5. Dual-port memory was the next option I was looking at.

I don't think this would work with interrupts, and it certainly didn't work with DMA. This is all done in a for loop that first writes data (with the data output controlled by output-disable and a decoded address chip select from the Pocket PC) and then will block a bit waiting in case R/W goes low to write.

0 Likes
cadi_1014291
Level 6
Level 6
25 likes received 10 likes received 10 likes given

This might not help at all, but did you tried assigning a higher priority to the DMA Channels?

0 Likes
Anonymous
Not applicable

Yes. Didn't help.

0 Likes
Anonymous
Not applicable

Hi,

just to let you know "my first attempt" I assumed I could have done a thing of this kind, use two ports for "address" and one for "data" and have some bits for /CE /RW.

The idea was to set an interrupt transition on the /CE pin ( high to low ) and have an interrupt routine of the kind :

INTERRUPT :

read port A

read port B

read = Read(RW);

address = base_mem + a << 8 + b;

if ( read )

{

  byte = *address;

  write_byte ( port C)

}
else

{

byte = read_byte ( port C)

*address = byte;

}

END INTERRUPT

Turns out .. "too slow" and you are still consuming lot of pins, plus the mem available is not that much.

I don't think there's an easy way to do this with the PSOC, nor really with others MCUs, the only one that I've seen doing something 'more or less as I just done it here', in ASM, was on a totally different 100 Mhz CPU ( which I won't mention here because it's 'competition' ).

Now a dual port RAM chip costs lot more than the PSOC5 still, if that's what you really need probably there's no other simpler solution, in the end of the day it's all down to 'clock cycles and how fast you can be', I was trying to do 'something similar but a bit more complex' for a 6809/6502 CPU and yet turns out 'math does not add'/'can't be fast enough for doing it that way' even if the old CPU works at 1 Mhz.

If you want 'my overall impression about PSOC4/5 I got so far' is that 'for the price and such it's pretty good for MANY things BUT NOT for ALL the things one could think about' I see it 'more suited for analog processing' that for this sort of stuff that 'is a bit a specialized app'.

One thing I still find quite a bit limited on the PSOC5 is the amount of program/ram inside especially going with a C compiler and such, also yes it's good/no good but the number of available I/O pins - when you start doing 'this kind of stuff' - is little.

0 Likes
Anonymous
Not applicable

I tried an interrupt-based approach, too. And, yes, that was also too slow. Like DMA, there are several clock cycles to push current registers and start servicing an interrupt. Only the continuous for loop approach has worked for me. I don't need the MCU to be doing anything else when the CPU needs it to be acting like RAM, so that's OK.

In psedocode, I'm doing something like this:

for (;;)

{

     laddress = pin_address_low_PS; //8 low address bits

     page = status_page_status;  //decoded page from 8 high address bits and a status register

     data_DR = buffer[page][laddress]; //output enable for data pins is controlled by CS and CPU's output disable

     while (... ) //a bunch of stuff to wait to see whether R/W goes to write while the address is the same and CE is high

          if(...) //write data conditions met

               {

                    buffer[page][laddress]=data_PS;

                    if(...) //data write was command for the MCU  

                    {

                         data_DR = BUSY_STATUS_CODE;

                         //do selected command, e.g. open an SD file or send some BLE data

                         data_DR = result status code...

                    }

               }

}

For my project, the 64K of RAM has so far been plenty. I was planning to add 24K of expansion RAM to the pocket PC using a 62256 and having the PSoC 5 do the CS decoding (that expansion is a device I already have working with the 62256 and dedicated 74HC decode logic), but I think I'm going to try using some of the remaining PSoC RAM instead. I'm currently only using 4K for the memory mapped buffer, and less than 16K total. If the loop is still fast enough once I add the expansion RAM emulation, then the question will become one of power consumption and data persistence logistics with the PSoC RAM vs the 62256.

0 Likes
Anonymous
Not applicable

Actually, it just occurred to me that, so far, I have not implemented anything in the "ROM" that's served up by the MCU that would require the CPU to wait for the MCU to finish a long-running task. I want to add some commands like that, though, to do more complex things entirely with extensions from the MCU. That will be a problem if the MCU can't serve up ROM while it's busy.

I can think of a couple of options to accomplish this. One would be to copy the wait-for-done-status loop over to a small area of pocket PC RAM and have the CPU loop in that until the MCU updates to a non-busy status. Another would be to stick with the 62256 for expansion RAM and copy the "ROM" data to an unused section in it (there's 8K that doesn't get mapped into the CPU's space), then switch address decoding in the MCU so this space shows up where the ROM needs to be. This could be done at startup or at first use of a command. Startup seems like it would be easier, but I'm not sure that the CPU HALT and INT lines I would need for the MCU to take over the bus are available on the expansion slot I'm planning to use.

0 Likes