Determinism of floating point operations

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
JM_97
Level 1
Level 1
5 sign-ins First reply posted First question asked

Hello,

I am using PSoC 4500 S to estimate the execution time of algorithms. For this purpose, I started to measure different arithmetic operations with the Systick-Timer (microbenchmarks).

However, especially floating point operations show a non-linear progression of execution time depending on the input data. For example cosf varies between 79000 us and 1051020 us (each for 1000 iterations). This missing determinsim makes it difficult to extract a tight bound from the microbecnhmarks for execution time estimation.

I attached a little overview of the min and max measurements as an excel sheet. It is also noticeable that floating point division highly depends on the  input data and the order of the operations (denominator and numerator). For validation I measured the times with code optimization (O3) and without code optimization (Debug). Furthermore, I analyzed the assembler code to gain further information - without success. 

Here a code snippet how I measure the cosf function. The entire measurement routine is validated and can be excluded as a source of error:

////////////////////////////////////////////////////////////////////////////////////////////

int num_points;
float sortedArray[2000];

for(i = 0; i < 2000; i++)
{
 sortedArray[i] = i*3.1415;
}

for (num_points = 0; num_points <= 2000; num_points++)

{
START_TIME = SysTick->VAL // Start timer

// Microbenchmark: cos

for( i = 0; i < 1000; i++)
{
result1 = cosf(sortedArray[num_points]);
}

 

STOP_TIME = SysTick->VAL; // Stop timer

}

////////////////////////////////////////////////////////////////////////////////////////////

On the second sheet of the excel file you find the measurements for the first 100 number of points (num_points) for cosf-microbenchmark . Here is mean the metric of interest.

 

How is this non-determinsim in arithmetic operations explainable?

I expected approximately equal execution times for each class of arithmetic operations (fDiv, fMul, fcos,...).

 

Best Regards

Jannik 

 

0 Likes
1 Solution
odissey1
Level 9
Level 9
First comment on KBA 1000 replies posted 750 replies posted

JM_97,

If you are concerned about faster code execution using PSoC4 processor, the right way is to avoid floating operations altogether. You may have already noticed that multiplying floats costs about 100 clock ticks, division ~ 600 ticks and sine ~1000 ticks. And PSoC4 was not made for speed.

Using integers you may speedup calculations 3-10 times with acceptable accuracy. For faster results it is better to stay within int32 domain. But if extra accuracy is needed for division, just multiply nominator, e.g. by 1000, and use int64 arithmetic, it is still ~3 times faster  than floats. Use sine tables to calculate trig functions, depending on accuracy, 50-100 ticks are possible for 12-16 bits accuracy.  

View solution in original post

0 Likes
5 Replies
Arpit_S
Moderator
Moderator
Moderator
250 replies posted 100 solutions authored 250 sign-ins

Hi @JM_97 ,

 

Please refer to below mentioned thread:

https://community.infineon.com/t5/PSoC-6/How-to-enable-hard-floating-point-support-in-PSoC-62S2-WiFi... 

 

Thanks!
Kind Regards

Arpit Srivastav

0 Likes
BiBi_1928986
Level 7
Level 7
First comment on blog 500 replies posted 250 replies posted

Hello.

Code executing from internal FLASH, uses a read-cycle accelerator algorithm to attempt to make FLASH appear as zero wait state memory.  It has a very indeterminant outcome. 

You can move your code into SRAM.  SRAM has zero wait state read-cycle timing.  The benchmark timings should be more consistent.  Be sure to move all dependent function call routines into SRAM too.

Let us know how it turns out.

0 Likes
JM_97
Level 1
Level 1
5 sign-ins First reply posted First question asked

Hello @BiBi_1928986,

thank you for your reply.

Your explanation sounds reasonable and I would like to try it.

Can you tell me how I can force some functions or variables to be stored in SRAM. I find only one post regardung this topic which is no help:

https://community.infineon.com/t5/Knowledge-Base-Articles/Controlling-SRAM-Usage-in-PSoC-Application...

Thanks

 

Kind regards

Jannik

0 Likes

Hello.

AN89610 chapter 9 describes how to put code into SRAM.  If you follow it, you'll be successful.
PSoC® Arm® Cortex® Code Optimization (infineon.com)

Be aware that PSoC 4500 S has limited memory size.  If you move too much code into SRAM, you'll run out of SRAM memory.

0 Likes
odissey1
Level 9
Level 9
First comment on KBA 1000 replies posted 750 replies posted

JM_97,

If you are concerned about faster code execution using PSoC4 processor, the right way is to avoid floating operations altogether. You may have already noticed that multiplying floats costs about 100 clock ticks, division ~ 600 ticks and sine ~1000 ticks. And PSoC4 was not made for speed.

Using integers you may speedup calculations 3-10 times with acceptable accuracy. For faster results it is better to stay within int32 domain. But if extra accuracy is needed for division, just multiply nominator, e.g. by 1000, and use int64 arithmetic, it is still ~3 times faster  than floats. Use sine tables to calculate trig functions, depending on accuracy, 50-100 ticks are possible for 12-16 bits accuracy.  

0 Likes