Are there any practical differences between the Arm M0 and M3 for the C programmer?

HuEl_264296 · ‎Jan 09, 2023

The Arm Cortex-M0 microcontroller supports a subset of the instructions provided by the Cortex M3. Presumably these extra instructions provide better performance for some applications.

But does this have any implications for a developer writing C code for either of these devices? Would you write code differently knowing that you were developing for one instruction set rather than the other?

For example, if I knew that my target hardware didn't have a single-cycle multiply instruction, I would use bit-shifting rather than multiplying where possible.

BiBi_1928986 · ‎Jan 11, 2023

Hello.

From s/w perspective, you'll get more performance and code density buy purchasing a decent complier regardless of M0 vs M3. GCC is okay, but not the greatest. Any s/w designer is going to write code whatever way they feel like. It's the compiler that makes the final image of that effort.

That said, M3 has the extra feature of Bit-Banding (BB). Here, a s/w designer has to know how to use it efficiently in order to get the most out of it. BB will always result in more compact code and speed for register-based bit manipulation. And, performs the equivalent of a non-interruptible read-modify-write in a single instruction (there could be exceptions to this). Great feature for setting/clearing GPIO, interrupt reg's bits, core register bits.

M0 on the other hand will have compact code size (THUMB v6). But given most M0 microcontroller implementations have small FLASH/RAM sizes, s/w designers may have issue with that.

View solution in original post

Len_CONSULTRON · ‎Jan 09, 2023

HuEl,

The compiler is informed of the core being used and therefore is aware what instructions to be used. The GCC uses an optimization algorithm to determine the best instructions to use for the data and code criteria you specify.

Suggestion: Set your compiler optimization for SPEED. Let it compile. You can then look at code it generates. If it doesn't meet your needs, you can create your own coding in that section. For example, you can create your own assembler functions.

Len
"Engineering is an Art. The Art of Compromise."

HuEl_264296 · ‎Jan 09, 2023

Hi Len,

Thanks for the reply. I'm aware that the compiler can optimise code.

I'm really interested in targeting performance critical C code to match the architecture well.

For example, you would (or should) write C differently depending if your target architecture was an 8-bit PIC or a 32-bit ARM.

Similarly, would you write performance critical code differently knowing that your target architecture was M0 or M3 ?

Len_CONSULTRON · ‎Jan 09, 2023

HuEl,

Each core has a specific #define value. You can create a header or c file conditional compile section to compile different code based on the core being used.

I hope this helps.

Len
"Engineering is an Art. The Art of Compromise."

luisji · ‎Jan 12, 2023

Hello Dennis

I have been trying to talk with you by phone, but your number gives a 'busy tone' at any hour. Do you think that we could finish the project?

Have a nice day

Luis

odissey1 · ‎Jan 10, 2023

PSoC4 and PSoC5 are slow micros by today standards. They are not for high speed or DSP apps. You can find 200+MHz chips with hardware dividers, CORDIC or floating math for lesser price.

HuEl_264296 · ‎Jan 10, 2023

Hi Odissey1,

Indeed, they are not that fast. But still, they are often called upon to give as much as they can. And there are sometimes parts of an application which are especially time sensitive.

And it certainly doesn't mean that code performance is not interesting, especially in cost sensitive applications that can't afford a high speed DSP, or in robotics applications like ours, where the PSoCs are extremely useful compared to normal microcontrollers or DSPs, due to their reconfigurable hardware.

So my question still stands unanswered, and I would love to hear an answer from someone who actually understands the performance implications of the various instruction sets.

Many thanks

Len_CONSULTRON · ‎Jan 11, 2023

HuEl,

I can understand your "need for speed". (Preferred term: Maximized performance.)

I don't know your specific application needs but here are the design decisions I use.

If I'm creating a very tight time-sensitive application I tend to use the following design techniques in order of preference.

HW State Machine.

I'm a big proponent of using HW state machines. If properly used, bit/byte banging takes virtually 0% of CPU time. This is why the PSoC5 with UDBs is my preferred PSoC.

I started my engineering career in 1983. At that point, you created all your designs using non-embedded CPUs (no internal peripherals) using 74xxx series logic gates. Everything connected to the CPU required custom HW state logic.

It was about 2 years later when programmable logic (PLDs, PALs, gate arrays ...) became available. It still allows for HW state logic but is now programmable by the designer.

DMA data movement

The PSoC provides decent DMA HW to minimize CPU clock utilization. This is a very effective method to move large amount of data. Using the DMA signal to trigger an interrupt after the bulk of the data has been moved to/from RAM will minimize CPU use.

CPU Interrupts

Interrupts are the first attempts in computer history to come close to multi-tasking. This allows the CPU to process specific functions "on-demand".

Note: Please keep all interrupt service routines "short and sweet". If you add blocking functions or do() or while() calls in the ISR, you could be getting into CPU main() processing lockup issues. These calls can lead to the ISR spending too much time in the ISR and not returning back to main() or allowing other ISRs to be processed.

I solved an issue where the user was calling a very long floating point calculation INSIDE the ISR and it took too long to return.

CPU clock Frequency

If your application can tolerate it, boosting your CPU clocking frequency helps overall. This method could be on the top of my list if the active current consumption and/or EMI issues are not a problem.

CPU table-driven algorithms

If I have a complex math function (ie Floating point) I may consider pre-calculating one or more tables where the input variable(s) index into a table(s). This is a common method to quickly come to an answer if you can allow a potential tolerance error.

Placing Frame signals (Debugging method)

After implementing one or more of the above methods and I still have a timing issue, I usually place framing signals (GPIO bit toggling) to profile my SW. This is used only for debugging but can be very helpful where to place more efforts.

Len
"Engineering is an Art. The Art of Compromise."

HuEl_264296 · ‎Jan 11, 2023

Hi Len,

Thank you, this is useful general advice for someone who is a beginner to the field of embedded development.

However, I have more than 20 years experience in this field, and have written firmware for several shipped products including consumer devices and advanced robotic control systems and networks, all of which needed to make very efficient use of limited computational power. I am quite familiar with everything you said.

But I am not asking about a general 'need for speed'.

My question was quite specific, and relates to the Cortex M0 and M3 instruction sets, and whether or not there's anything a C programmer can do to help the compiler make good use of them.

If you know anything about this, I would be very interested to know.

Kind regards

BiBi_1928986 · ‎Jan 11, 2023

Hello.

From s/w perspective, you'll get more performance and code density buy purchasing a decent complier regardless of M0 vs M3. GCC is okay, but not the greatest. Any s/w designer is going to write code whatever way they feel like. It's the compiler that makes the final image of that effort.

That said, M3 has the extra feature of Bit-Banding (BB). Here, a s/w designer has to know how to use it efficiently in order to get the most out of it. BB will always result in more compact code and speed for register-based bit manipulation. And, performs the equivalent of a non-interruptible read-modify-write in a single instruction (there could be exceptions to this). Great feature for setting/clearing GPIO, interrupt reg's bits, core register bits.

M0 on the other hand will have compact code size (THUMB v6). But given most M0 microcontroller implementations have small FLASH/RAM sizes, s/w designers may have issue with that.

luisji · ‎Jan 11, 2023

Hello Dennis

I have been trying to talk with you by phone, but your number gives a 'busy tone' at any hour. Do you think that we could finish the project?

Have a nice day

Luis

LeoMathews · ‎Feb 01, 2023

Hi @HuEl_264296 ,

Thread was locked due to inactivity for long time, you can continue the discussion on the topic by opening a new thread with reference to the locked one. The continuous discussion in an inactive thread may mostly be unattended by community users.

Thanks and Regards,
Leo

Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?

Re: Are there any practical differences between the Arm M0 and M3 for the C programmer?