Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Not applicable
Hi all,

I want to use the floating point radix-2 FFT function in arm math library on XMC4500, but the calculation time is too long to me.
After checking the .map file, it can be found that the FFT function is load in the unchached flash and running in the cached flash.
So I tried to modify the location file and add the _attibute_ keyword in the header file of the arm math library, but it does not work.
(It works for other functions I wrote.)
How can I running the FFT function and other functions in arm math library in the psram to speed up the performance?
Thanks in advance.

4 Replies
Not applicable
Hi Morris,

Basically you need to move all the required functions into your project.
This means you need to create a new *.c files in your project with the radix-2 FFT source code and also the header with the _attibute_.
Like what you have said, it works for other function your wrote.
So basically you copy the function provided into your project as if thats the function your wrote yourself.
I hope this is clear to you. 🙂
lock attach
Attachments are accessible only for community members.
10 sign-ins 5 sign-ins First like received

You can also do it modifying the linker.
Find attached the linker and startup file.
Below there is an extract of the modified linker. As you can see a new section is created, where the arm functions are placed, in my case sin and cosine functions.
This section has load address in flash and logical address in PSRAM.
You can also place in this section your own functions defined using __attribute__ ((section (".text.fastcode")))

	.fastcode : AT(LOADADDR(.startup) + SIZEOF(.startup))
__fastcode_start = .;
/* functions with __attribute__ ((section (".text.fastcode")))*/
__fastcode_end = .;
} >PSRAM_1
__fastcode_load = LOADADDR (.fastcode);
__fastcode_size = __fastcode_end - __fastcode_start;

PS: Idea taken from http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf

Best regards,
lock attach
Attachments are accessible only for community members.
Not applicable
Thanks for your reply. I tried jferreira's method and the sections of radix 2 fft are successfully moved to psram.
Please see the map as following.
But the calculation speed does not improved.
I used a system timer to generate an interrupt every 100ms, and see how many times of fft operation can be done within this period.
However, the counter value is both 130 times no matter the fft section is running in cached flash or psram.
Do you have any suggestion?


.fastcode       0x10000000      0x6b8 load address 0x0c0002b8
0x10000000 __fastcode_start = .
0x10000000 0x2a8 C:\DAVE-3.1\CMSIS\Lib\GCC\libarm_cortexM4_mathL_2.a(arm_cfft_radix2_f32.o)
0x10000000 arm_radix2_butterfly_f32
0x100002a8 0x2bc C:\DAVE-3.1\CMSIS\Lib\GCC\libarm_cortexM4_mathL_2.a(arm_cfft_radix2_f32.o)
0x100002a8 arm_radix2_butterfly_inverse_f32
0x10000564 0x48 C:\DAVE-3.1\CMSIS\Lib\GCC\libarm_cortexM4_mathL_2.a(arm_cfft_radix2_f32.o)
0x10000564 arm_cfft_radix2_f32
0x100005ac 0x10c C:\DAVE-3.1\CMSIS\Lib\GCC\libarm_cortexM4_mathL_2.a(arm_bitreversal.o)
0x100005ac arm_bitreversal_f32
0x100006b8 __fastcode_end = .
0x0c0002b8 __fastcode_load = LOADADDR (.fastcode)
0x000006b8 __fastcode_size = (__fastcode_end - __fastcode_start)
Level 4
Level 4
I can confirm. This method allowes to move the function in to PSRAM, and the performance grow is about 15% (in my case).