First Execution Time much Slower than subsequent Executions

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Berni27
Level 1
Level 1
5 sign-ins First reply posted First question asked

Hello, 

I did run a Benchmark with a Standard Matrix Multiplication Alogrithm on a CY8CKIT-062-BLE DEV Board.  I only made use of the Cortex M4 MCU. I ran the Benchmark 100 Times For Matrix Dimensions from 2x2 up to 100x100. I recorded the very First Execution time for each Dimension as well as the mean of the subsequent 99 executions. 
I observed the following behaviour: Up To a Dimension of about 30x30, the differente bestehen First Execution time and subsequent mean was rising up to 16 microseconds. From the Dimension 30x30 to 100x100, the difference rapidly dropped to around 2microseconds (with the First time being slower). 
Is there Any explanation for this Rapid drop? 
I do Not really Understand Why the difference Is Not continously growing with the Dimension?  
Thank you very much for your help. 

0 Likes
1 Solution
RodolfoGL
Employee
Employee
250 solutions authored 250 sign-ins 5 comments on KBA

This might be related to the internal cache. Assuming you are storing the data in flash. The CM4 has a a 8KB flash cache.

You can try move the matrices in the SRAM. You might see a more linear trend.

View solution in original post

1 Reply
RodolfoGL
Employee
Employee
250 solutions authored 250 sign-ins 5 comments on KBA

This might be related to the internal cache. Assuming you are storing the data in flash. The CM4 has a a 8KB flash cache.

You can try move the matrices in the SRAM. You might see a more linear trend.