Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Not applicable
Hi folks,

I'm currently working with an XMC4500 Relax Lite Kit using DAVE v4.1.4.
To investigate the timing behavior I have written a small measurement framework that causes an interrupt after 1, 2, 3, ..., n cycles until the program subject to measurement has completed.

Main measurement framework structure:
1. initialize hardware (enable FPU)
2. initialize measurement storage (DSRAM2)
3. invalidate PMU instruction buffer
4. disable sysclock
5. setup new systick reload value
6. reset current systick value
7. enable sysclock, systick exception, setup fCPU as clock source
8. execute measurement code
9. disable syslock

SysTick_Hander structure:
1. disable systick timer
2. update measurement sample (store pc of interrupted instruction to DSRAM2)
3. update systick reload value = last value + 1
4. return to measurement framework step 3.

So, in principle I can observe the progression of the control-flow until we have reached step 9 of the measurement framework.


Currently, I'm facing an unexpected behavior when in comes to the execution of code from cached and uncached PMU FLASH memory.
Apparently, code being executed from cached PMU FLASH appears to be executed much slower.

The piece of code subject to measurement is the following (i.e., 10 iterations of "loop"):


movs r7, #10
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop

loop:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
sub r7, #1
cmp r7, #0
bne loop

nop
nop
nop
nop
bx lr



Using the measurement framework I observe the following execution times (in cycles):

a) PREF_PCON = 0x0
FLASH0_FCPON = 0x3 (=> 3 read wait states)
Executing from PMU FLASH cached => 1809 cycles
Executing from PMU FLASH uncached => 681 cycles

b) PREF_PCON = 0x1 (=> PMU instruction buffer bypassed)
FLASH0_FCPON = 0x3 (=> 3 read wait states)
Executing from PMU FLASH cached => 681 cycles
Executing from PMU FLASH uncached => 681 cycles


How can it be that the code when being executed from PMU FLASH cached is approximately 3 times slower than when being executed from PMU FLASH uncached memory?
I have also tried to only measure the difference between the systick timer before and after the execution (i.e., no interrupts while executing the loop) with the same results.

Does anyone have a clue? What am I missing?


P.S.: I would have uploaded the full startup code, but apparently I cannot upload .S files to the forum.
0 Likes
5 Replies
Not applicable
Can anybody confirm that executing code from cached PMU FLASH (memory starting at 0x0800000) is slower than executing code from uncached PMU FLASH (memory starting at 0x0c000000) on an XMC4500 (Relax Lite Kit)?
0 Likes
lock attach
Attachments are accessible only for community members.
gwang
Employee
Employee
Hello,

I have tested with your provided codes and got following results:
a) PREF_PCON = 0x0
FLASH0_FCPON = 0x3 (=> 3 read wait states)
Executing from PMU FLASH cached => 475 cycles
Executing from PMU FLASH uncached => 803 cycles

b) PREF_PCON = 0x1 (=> PMU instruction buffer bypassed)
FLASH0_FCPON = 0x3 (=> 3 read wait states)
Executing from PMU FLASH cached => 794 cycles
Executing from PMU FLASH uncached => 804 cycles

Two conclusions:
1) cached is faster than uncached
2) PMU instruction buffer is faster.

Enclosed is my DAVE4.2.4 test project.
0 Likes
Not applicable
That is interesting. I will have a look. Maybe something is missing in my startup code.
0 Likes
Not applicable
I can confirm your numbers.
Just repeated the measurements with you project...

What I don't understand is why I cannot reproduce these results with my measurement framework.
Could you try to reproduce my measurements with my project on your board?
0 Likes
Not applicable
I have had a deeper look into my startup code.
Bad things happen after invalidation of the PMU instruction buffer.
After executing:


// invalidate the PMU FLASH instruction buffer (PREF_PCON.IINV = 1)
mov r12, #0x4000
movt r12, #0x5800
ldr r3, [r12]
orr r3, r3, 0x2
str r3, [r12]
dsb
isb


The measurement results are as I have written above.


If I add:


// (PREF_PCON.IBYP = 1)
mov r3, 0x1
str r3, [r12]
dsb
isb

// (PREF_PCON.IBYP = 0)
mov r3, 0x0
str r3, [r12]
dsb
isb


the PMU FLASH instruction buffer appears to work and measurement results are as expected.


I conclude that the PMU FLASH instruction buffer/cache invalidation procedure needs more than just setting the IINV bit of the PREF_PCON register.
The documentation could be a bit more descriptive than just:


8.3.2.1 Instruction Buffer

The instruction buffer may be invalidated by writing a 1B to PREF_PCON.IINV. After
system reset, the instruction buffer is automatically invalidated.

Note: The complete invalidation operation is performed in a single cycle.



But anyhow. Thank you very much in looking into this and helping me troubleshoot that problem.
0 Likes