Attachments are accessible only for community members.
Not applicable
Mar 29, 2016
07:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mar 29, 2016
07:20 AM
Hi folks,
I have been experimenting a lot with the XMC4500 recently and stumbled upon something very confusing.
Apparently, there the FLASH0_FCON.IDLE bit has a weird and unexpected performance impact.
In the attached DAVE v4.1.4 project the following code is executed from uncached PMU FLASH at 0x0c002000:
At the end of each measurement I compute the elapsed time be means of the systick register.
For the above code snippet I observe the following execution times:
FLASH0_FCON.IDLE = 0:
- WS=3 => 722 cycles
- WS=4 => 808 cycles
- WS=5 => 861 cycles
- WS=6 => 914 cycles
- WS=7 => 968 cycles
- WS=8 => 1022 cycles
So with increasing PFLASH wait states the execution times are higher, which is of course expected.
If I disable the static prefetching, I observe the following execution times:
FLASH0_FCON.IDLE = 1:
- WS=3 => 755 cycles
- WS=4 => 704 cycles
- WS=5 => 716 cycles
- WS=6 => 728 cycles
- WS=7 => 757 cycles
- WS=8 => 789 cycles
Here, execution times are always lower than with enabled static prefetching, even though mostly linear code is executed (except for the single branch every three FLASH pages).
The only exception is when 3 wait states are being configured.
How can this be explained? Why is 4 wait states, disabled static prefetch about 100 cycles than 3 wait states with enabled static prefetch?
Attached is my project.
To reproduce you can do "Run to Line" framework.S:88 (and adjust the hardware settings accordingly).
I have been experimenting a lot with the XMC4500 recently and stumbled upon something very confusing.
Apparently, there the FLASH0_FCON.IDLE bit has a weird and unexpected performance impact.
In the attached DAVE v4.1.4 project the following code is executed from uncached PMU FLASH at 0x0c002000:
movs r7, #10
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
loop:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
sub r7, #1
cmp r7, #0
bne loop
nop
nop
nop
nop
bx lr
At the end of each measurement I compute the elapsed time be means of the systick register.
For the above code snippet I observe the following execution times:
FLASH0_FCON.IDLE = 0:
- WS=3 => 722 cycles
- WS=4 => 808 cycles
- WS=5 => 861 cycles
- WS=6 => 914 cycles
- WS=7 => 968 cycles
- WS=8 => 1022 cycles
So with increasing PFLASH wait states the execution times are higher, which is of course expected.
If I disable the static prefetching, I observe the following execution times:
FLASH0_FCON.IDLE = 1:
- WS=3 => 755 cycles
- WS=4 => 704 cycles
- WS=5 => 716 cycles
- WS=6 => 728 cycles
- WS=7 => 757 cycles
- WS=8 => 789 cycles
Here, execution times are always lower than with enabled static prefetching, even though mostly linear code is executed (except for the single branch every three FLASH pages).
The only exception is when 3 wait states are being configured.
How can this be explained? Why is 4 wait states, disabled static prefetch about 100 cycles than 3 wait states with enabled static prefetch?
Attached is my project.
To reproduce you can do "Run to Line" framework.S:88 (and adjust the hardware settings accordingly).
- Tags:
- IFX
1 Reply
Not applicable
Apr 01, 2016
12:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apr 01, 2016
12:02 AM
No ideas? Is there any documentation about the PFLASH prefetch/global read buffer behavior?
Are there any timing diagrams available showing a PFLASH single read / burst read?
Are there any timing diagrams available showing a PFLASH single read / burst read?