Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

AURIX™ Forum Discussions

Level 2
Level 2
First like received
Hi all,

How to decrease CPU load of a TriCore™ CPU ?

Thank you a lot in advance !

#8042000 12187
4 Replies
Level 5
Level 5
First solution authored First like received
Hi Tita.

Several methods can be aplied to offload CPU:

1.[B] Identification of functions and tasks which consume most of the loads:

Tools can be used to measure durations of each task and repartition of tasks over time: You can find on Infineon website the list of our partners who can propose this kind of trace and real time monitoring tools.

In addition, some performance counters are available in each core. They can be used to measure performance of CPU: CCNT, ICNT, MxCN.

2. Activation of Cache and use of cacheable addresses in SW. :

In each CPU, there are 2 types of cache: data cache and instruction cache. They can be individually activated to reduce the average of access times to Flash resources.

3. Map critical resources in local RAM of the CPUx.

CPUx will need 0 waitstate to access to its local RAM.

Map the variables in DSPR of the CPU who accesses to them most of the time.

Map code of critical functions in local PSPR RAM of the CPUx which call this function.

4. Use of compiler options:

Some options are available for each compiler to increase execution speed of function, code sizes…

Efficient Addressing:

Faster execution time can be achieved using specific addressing types: customer will need less instructions to access to resources (register, memory…):

Short addressing (Base + Long Offset addressing using global Base Registers (A0, A1, A8, A9) provides efficient data access in the address range of 64KB).

Near Addressing (customer can use near segments to locate variables and constants (located in first 16kB of each TriCore 256MB memory segment).

6. Check configuration of waitstates to access Flash (Calculation formulas available in User Manual).

7. Check clock is correctly configured (CPU, SRI, SPB…)

8. Additional Optimizations potential :

Instead of emulation library, customer can use single-precision Floating Point Unit (compiler option).
By setting --no-double option the compiler treats variables of the type double as float.

9. Intrinsic Functions:

Some intrinsic functions are proposed to use specific assembly instructions have no equivalence in C.

10. Critical functions/tasks can be optimized directing in assembler:

In this case, optimize the use of Tricore superscalar pipeline (optimize delay time with sequencing of instructions IP, LS, LP).

Inline assembler can be directly used in C code (you can pass C variables as operands).

Deeper details can be found in application note AP32168

I hope that was helpful !

Kind regards
Level 2
Level 2
10 replies posted 5 replies posted 5 questions asked

The discussion is so interesting to me.

However, the AP32168 is for TC1.6.

Is this discussion applied to the latest architecture and regarding to multicore, any additional advise ?

Best regards,
Level 6
Level 6
10 likes received 10 solutions authored 5 solutions authored
The same list applies to the latest TC1.6.2P on the TC3xx.

For multicore systems, these also apply:
- Keep data close to the CPU that needs it (e.g., CPU1 should mostly rely on DSPR1)
- Use test-and-test-and-set loops on atomic objects like semaphores instead of pure test-and-set
- Place semaphores in dLMU to reduce the impact of a remote CPU on local CPU performance
- Be careful with data cache; because the AURIX does not have automatic cache coherency, you must either manage the cache yourself, or modify PMA0 from the default 0x300 to 0x100 so that only PCACHE (for constants) is cached
Level 2
Level 2
10 replies posted 5 replies posted 5 questions asked
Thank you for your quick and kind reply.

AP32168 list some benchmark result.

Can we get the benchmark applications ?