STM measurement and IfxCpu_getPerformanceCounter differences

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
kjsmith
Level 1
Level 1
First like given 10 sign-ins 5 replies posted

Hello, 

I'm trying to do some performance measurements on functions and I'm getting results that I cannot explain. Most likely I am overlooking something or do not have something configured correctly so looking for any suggestions. 

Using the TC375 lite kit, I have Core 1 dedicated for this testing. The start of the function looks like this: 

 

void core1_main(void)
{
    IfxCpu_enableInterrupts();
    /* !!WATCHDOG2 IS DISABLED HERE!!
     * Enable the watchdog and service it periodically if it is required
     */
    IfxScuWdt_disableCpuWatchdog(IfxScuWdt_getCpuWatchdogPassword());
    IfxScuWdt_disableSafetyWatchdog(IfxScuWdt_getSafetyWatchdogPassword());
    IfxStm_enableOcdsSuspend(&MODULE_STM1);
    /* Wait for CPU sync event */
    IfxCpu_emitEvent(&g_cpuSyncEvent);
    IfxCpu_waitEvent(&g_cpuSyncEvent, 1);
    fstm_1 = IfxScuCcu_getStmFrequency();

    //setupCSV();
    collectMetrics();
    //test1();
    //closeCSV();
    while(1)
    {
    }
}

 

 

collectMetrics performs two checks, PER0 is a marco for the function being tested. 

 

void collectMetrics()
{
   getCTimeMetrics(PER0, &maxCTimePER0, &minCTimePER0, &timeCPER0);
   getIfxMetrics(PER0, &instructionCountPER0, &clockCountPER0);
}

 

 

The getCTimeMetrics, uses STM1 to get timing and calculate the delta that the function takes to run. 

 

void getCTimeMetrics(void (*func)(), double *maxTime, double *minTime, double *timeArray)
{
    unsigned long lower;
    unsigned long upper;
    unsigned long long start;
    unsigned long long end;
    unsigned long long delta;
    for(int i = 0; i < 10; i++)
    {
        lower = ((uint32)MODULE_STM1.TIM0.U);
        upper = ((uint32)MODULE_STM1.CAP.U);
        start = (unsigned long long)((unsigned long long)(upper << 32) | (unsigned long long)lower);

        func();

        lower = ((uint32)MODULE_STM1.TIM0.U);
        upper = ((uint32)MODULE_STM1.CAP.U);
        end = (unsigned long long)((unsigned long long)(upper << 32) | (unsigned long long)lower);
        delta = end - start;
        timeArray[i] = (double)(((double)delta)/(double)CLOCKS_PER_SEC);
    }
    *maxTime = timeArray[0]; /* place holder for now */
    *minTime = timeArray[0];

}

 

 

The other function uses the cpu performance functions to get the instruction count and clock count. 

 

void getIfxMetrics(void (*func)(), uint32 *instructionCount, uint32 *clockCount)
{
    IfxCpu_resetAndStartCounters(IfxCpu_CounterMode_normal);
    func();
    IfxCpu_stopCounters();
    *instructionCount = IfxCpu_getInstructionCounter();
    *clockCount = IfxCpu_getPerformanceCounter(CPU_CCNT);
}

 

 

We tested the function measuring it with Gliwa and on a 200MHz part it ran in a range of 230ns to 270ns. I would expect being a 300MHz part, it would be quicker on the TC375. However, the results I am getting from the two functions are, in my initial thoughts, way to different. 

I'm getting 52 clock counts which would scale to about 173 ns with a 300MHz clock. However, the STM1 calculations are reporting around 290ns. I figured there would be some differences, but I cannot justify why there is 120ns difference. All code flash and static RAM for these functions is in core 1 (pflash1 and dsram1).  If it was being interrupted I would assume I would see the same results on either function. 

Does anyone have any thoughts to why I am seeing such a large difference? Does this make sense? I feel I am overlooking or missing something, but I cannot see it. 

Thanks in advance for any suggestions. 

0 Likes
1 Solution
Aiswarya_A
Moderator
Moderator
Moderator
25 likes received 250 sign-ins 50 solutions authored

Hi Kevin,
Accessing the STM registers will take comparatively more time than stopping the CPU registers. If you increase the execution time of the func(), you will observe the same difference in time.

Regards,
Aiswarya.

View solution in original post

0 Likes
4 Replies
Aiswarya_A
Moderator
Moderator
Moderator
25 likes received 250 sign-ins 50 solutions authored

Hi 

We wanted to reproduce this at our end. Could you please let us know the value you are using for "CLOCKS_PER_SEC" in the getCTimeMetrics() function?

Regards,
Aiswarya.

0 Likes

Hi Aiswarya, 

The value is coming from the time.h file that is included with the complier with the Aurix Development Studio. Here is the snapshot where the macro is defined and from debugging it is telling me the value is 100000000 or 100Mhz which would match the Fstm value I'm getting from the IfxScuCcu_getStmFrequency() function: 

#ifndef CLOCKS_PER_SEC
# define CLOCKS_PER_SEC  ((clock_t)__clocks_per_sec)    /* resolution of clock() */
#endif
extern  clock_t __far           __clocks_per_sec;

 

Thanks,
-Kevin 

0 Likes
kjsmith
Level 1
Level 1
First like given 10 sign-ins 5 replies posted

Hi Aiswarya, 

Just was curious if you were able to reproduce the issue on your end? 

Thanks,

-Kevin

0 Likes
Aiswarya_A
Moderator
Moderator
Moderator
25 likes received 250 sign-ins 50 solutions authored

Hi Kevin,
Accessing the STM registers will take comparatively more time than stopping the CPU registers. If you increase the execution time of the func(), you will observe the same difference in time.

Regards,
Aiswarya.

0 Likes