Currently I perform some Application Performance Optimisation and follow the Infineon Appnote AP3216810. One topic is to shift critical functions from PFLASH0/1 into the own PSPR0/1/2 of the respective Core.
But what about functions which are frequently used by all cores? In my case this is memcpy. I currently investigate to shift them also from PFLASH0 into PSPR1. It is clear that Core0/Core2 then need some extra cycles for instruction fetch from other PSPR1. But even worse in TriCore TC1.6.2 core architeccture manual, it is stated that segment 7H - 0H is Non-cacheable Memory. For own PSPR this is desired, but what about execution of code from foreign PSPR on Core0/Core2? Does this mean that due to uncached memory extra CPU cycles accumulate by each count of the of the memcpy function? I fear that this makes the situation even more worse on Core0/Core2 compared to just running it from cached PFLASH0 on all cores?
The other choice would be to use DLMU1, which is slower than PSPR1 but probably faster than PFLASH0 and cached for all cores.
One other idea is to duplicate the memcpy function and add it as memcpy_CORE0, _CORE1 and _CORE2 and put it into the respective PSPR for each core, and leave the original memcpy in PFLASH0 (since it is also used by crt0 for initialisation of variables). But is it that worth?
Is there any best practice which memory to use for common functions: PFLASH0, PSPRn or DLMUn?
thanks for your question. It sounds really interesting. Not many people would be able to answer that, I suppose. I think it always depend on your application. SRAM is for sure faster than FLASH and FLASH can memorize data over time but can not be rewritten infinitely. Hence, the question is what you really wanna do in your application. Maybe then one is able to make suggestions.
for deterministic and performance critical code which is used by all cores and which should be put into PSPR, the general approach would be to link/locate it into segment 12 (address’ 0xCxxx_xxxx) and duplicate the same code into the PSPR of each core.
When an access to PSPR is performed via address 0xCxxx_xxxx,
- this will always access the local PSPR of the core and thus have minimal access time.
- there is no need to have core specific functions (e.g. memcpy_core0, _core1) as the same address’ could be used on all cores.
Drawback: code would be duplicated into the PSPR of each core.
If code is located only in one PSPR (your given example: PSPR1), this is beneficial for the core using the local access (here core 1).
For the other cores, non-cached executing of code from a “remote PSPR” might be slower compared to cached execution from flash.
(This is depending on the number/frequency of the calls to functions in "remote PSPR" and the amount of “looping” within a function call, ie. potential benefit of consecutive cache hits vs. longer access to "remote PSPR").
many thanks for pointing in the right direction! Segment 0xC shoud to do the job fine. And duplicated objects are no problem since the end justifies the means.
Unfortunately I did not manage to get the object at 0xC0000000. Neither by surrounding source with "#pragma code_core_association clone", nor by modify_input in the linker file. In the mapfile I see that memcpy is linked at 0xC000000, but this location as well as 0x70100000 and 0x601000 are empty. So the actual "clone" is somehow missing.