Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
User19008
Level 1
Level 1
Some performance tests using Triboard TC275 and Tasking C Compiler with optimization level=3, speed
showed the following results:

1. Addition of 2 int variables
a) defined as local inside a function: 1 cycle (as expected)
b) defined as global outside a function: 18 cycles ???

2. Addition of 2 float variables
a) defined as local inside a function: 2 cycles (as expected)
b) defined as global outside a function: 18 cycles ???

Global variables are defined with __near memory qualifier to simplify assembler code.
Behavior on Cpu0 and Cpu1 is identical.

Can anyone explain this strong performance degradation using globals?
For my application I am forced to use some globals to keep static data, but I am not allowed to waste some time/cycles.
0 Likes
5 Replies
NeMa_4793301
Level 6
Level 6
10 likes received 10 solutions authored 5 solutions authored
Without seeing the assembly code, it's hard to say. My guess is that for the local case, the compiler is assigning your local variables to registers. Less instructions, more speed.

For the global case, it also matters where the variables are stored. Storing a variable in local DSPR takes 0 wait states. Storing a variable in LMU RAM has to go across the SRI bus, but it could be that the CPU Store Buffers absorb the impact without causing a stall.

Share a bit of the disassembly and I can give a better response 🙂
0 Likes
User19008
Level 1
Level 1
Thank you very much for the fast response. My first reply seems to be lost. Another turn.
Here the version with local integers:
64 lC=lA+lB
000000008000211a add d2, d4, d5
You are right. Everything is done with registers.
Now the version with global integers:
80 gC=gA+gB;
00000000800020f6 ld.w d15, gA (0x60000000)
00000000800020fa ld.w d1, gB (0x60000004)
00000000800020fe add d15, d1
0000000080002000 st.w gC (0x60000008), d15
Assembly code also looks as expected. Some instructions more but not so much cycles.
I am confused about the addresses of global variables, which are in the DSPR of Cpu1.
I used Cpu0. Have you any hint?
0 Likes
NeMa_4793301
Level 6
Level 6
10 likes received 10 solutions authored 5 solutions authored
Not so surprising then - those accesses have to go across the SRI bus, which will add a few more cycles when the store buffers are all busy.

Where variables go is dictated by the linker. Tasking gives you a very quick cheat you can use to avoid the relatively difficult linker syntax. Use __at( address ) to specify where a variable is located in the C file:
unsigned long myvar __at(0x70001000);


Give that a try, but eventually you'll need to figure out how to declare LSL sections and addresses to get the best performance. You can steal many such examples from the iLLD files and demos.
0 Likes
User19008
Level 1
Level 1
Thank you very much. That works very fine!!! Now all cycles are as expected.
I have only a small number of global variables. Thus this workaround will be good for now.
To be prepared for the future. Can you please tell me, where to find the iLLD files and demos?
0 Likes
NeMa_4793301
Level 6
Level 6
10 likes received 10 solutions authored 5 solutions authored
0 Likes