Level 1
Level 1

# Determinism of floating point operations

Hello,

I am using PSoC 4500 S to estimate the execution time of algorithms. For this purpose, I started to measure different arithmetic operations with the Systick-Timer (microbenchmarks).

However, especially floating point operations show a non-linear progression of execution time depending on the input data. For example cosf varies between 79000 us and 1051020 us (each for 1000 iterations). This missing determinsim makes it difficult to extract a tight bound from the microbecnhmarks for execution time estimation.

I attached a little overview of the min and max measurements as an excel sheet. It is also noticeable that floating point division highly depends on the  input data and the order of the operations (denominator and numerator). For validation I measured the times with code optimization (O3) and without code optimization (Debug). Furthermore, I analyzed the assembler code to gain further information - without success.

Here a code snippet how I measure the cosf function. The entire measurement routine is validated and can be excluded as a source of error:

////////////////////////////////////////////////////////////////////////////////////////////

int num_points;
float sortedArray[2000];

for(i = 0; i < 2000; i++)
{
sortedArray[i] = i*3.1415;
}

for (num_points = 0; num_points <= 2000; num_points++)

{
START_TIME = SysTick->VAL // Start timer

// Microbenchmark: cos

for( i = 0; i < 1000; i++)
{
result1 = cosf(sortedArray[num_points]);
}

STOP_TIME = SysTick->VAL; // Stop timer

}

////////////////////////////////////////////////////////////////////////////////////////////

On the second sheet of the excel file you find the measurements for the first 100 number of points (num_points) for cosf-microbenchmark . Here is mean the metric of interest.

How is this non-determinsim in arithmetic operations explainable?

I expected approximately equal execution times for each class of arithmetic operations (fDiv, fMul, fcos,...).

Best Regards

Jannik

5 Replies
Moderator
Moderator

Hi @JM_97 ,

Thanks!
Kind Regards

Arpit Srivastav

Level 7
Level 7

# Re: Determinism of floating point operations

Hello.

Code executing from internal FLASH, uses a read-cycle accelerator algorithm to attempt to make FLASH appear as zero wait state memory.  It has a very indeterminant outcome.

You can move your code into SRAM.  SRAM has zero wait state read-cycle timing.  The benchmark timings should be more consistent.  Be sure to move all dependent function call routines into SRAM too.

Let us know how it turns out.

Level 1
Level 1

# Re: Determinism of floating point operations

Hello @BiBi_1928986,

Your explanation sounds reasonable and I would like to try it.

Can you tell me how I can force some functions or variables to be stored in SRAM. I find only one post regardung this topic which is no help:

https://community.infineon.com/t5/Knowledge-Base-Articles/Controlling-SRAM-Usage-in-PSoC-Application...

Thanks

Kind regards

Jannik

Level 7
Level 7

# Re: Determinism of floating point operations

Hello.

AN89610 chapter 9 describes how to put code into SRAM.  If you follow it, you'll be successful.
PSoC® Arm® Cortex® Code Optimization (infineon.com)

Be aware that PSoC 4500 S has limited memory size.  If you move too much code into SRAM, you'll run out of SRAM memory.

Level 9
Level 9

# Re: Determinism of floating point operations

JM_97,

If you are concerned about faster code execution using PSoC4 processor, the right way is to avoid floating operations altogether. You may have already noticed that multiplying floats costs about 100 clock ticks, division ~ 600 ticks and sine ~1000 ticks. And PSoC4 was not made for speed.

Using integers you may speedup calculations 3-10 times with acceptable accuracy. For faster results it is better to stay within int32 domain. But if extra accuracy is needed for division, just multiply nominator, e.g. by 1000, and use int64 arithmetic, it is still ~3 times faster  than floats. Use sine tables to calculate trig functions, depending on accuracy, 50-100 ticks are possible for 12-16 bits accuracy.