SDMMC Systimer Usage reentrancy bug?

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Not applicable
Hello,

I'm debugging sporadic Lockups in my XMC4500 application. After turning on fault debugging it appears memory faults are occurring in the SYSTIMER timer list handling code that is called from SDMMC module. I added some redundant NULL checks and they trigger.

Here is the full backtrace:


Program received signal SIGTRAP, Trace/breakpoint trap.
SYSTIMER_lRemoveTimerList (tbl_index=2) at ../Dave/Generated/SYSTIMER/systimer.c:245
245 __BKPT(0);
bt
#0 SYSTIMER_lRemoveTimerList (tbl_index=2) at ../Dave/Generated/SYSTIMER/systimer.c:245
#1 0x08027f18 in SYSTIMER_StopTimer (id=3) at ../Dave/Generated/SYSTIMER/systimer.c:503
#2 0x0802b5c6 in SDMMC_BLOCK_SD_lStopTimer (obj=0x200001a8 ) at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:2077
#3 0x0802aa60 in SDMMC_BLOCK_SD_lSingleBlockTransfer (obj=0x200001a8 , cmd=0x8037426 , arg=56771, block_size=512, buf=0x2000414c , transfer_mode=SDMMC_BLOCK_SD_DATA_TRANSFER_WRITE_BUFFER) at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:1291
#4 0x0802ba4a in SDMMC_BLOCK_SD_lCardWriteSingleBlock (obj=0x200001a8 , write_buf=0x2000414c , write_addr=56771) at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:2391
#5 0x0802c374 in SDMMC_BLOCK_SD_WriteBlock (obj=0x200001a8 , write_buf=0x2000414c "5655800;6991;999;0.000031;351;999;-0.000836;4093130;999;-0.001110;-0.000000;118.630000\r\n7157175;715717500;6994;999;0.000075;320;999;-0.000867;4093019;999;-0.001801;-0.000000;118.640000\r\n7157774;715777"..., sector_num=56771, sector_count=1 '\001') at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:2934
#6 0x080290a2 in SDMMC_BLOCK_WriteBlock (obj=0x200001a8 , write_buf=0x2000414c "5655800;6991;999;0.000031;351;999;-0.000836;4093130;999;-0.001110;-0.000000;118.630000\r\n7157175;715717500;6994;999;0.000075;320;999;-0.000867;4093019;999;-0.001801;-0.000000;118.640000\r\n7157774;715777"..., sector_number=56771, sector_count=1 '\001') at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block.c:250
#7 0x0802fd3c in disk_write (drive=0 '\000', buffer=0x2000414c "5655800;6991;999;0.000031;351;999;-0.000836;4093130;999;-0.001110;-0.000000;118.630000\r\n7157175;715717500;6994;999;0.000075;320;999;-0.000867;4093019;999;-0.001801;-0.000000;118.640000\r\n7157774;715777"..., sectornumber=56771, sectorcount=1) at ../Dave/Generated/FATFS/fatfs.c:294
#8 0x0802ed46 in f_write (fp=0x20004124 , buff=0x100006bc, btw=64, bw=0x1000073c) at ../Dave/Generated/FATFS/ff_Src/ff.c:2694
#9 0x080325ae in jm_integrate (state=0x20003e50 , time=7159564) at ../joulemeter.c:601
#10 0x08032ddc in main () at ../main.c:577
bt full
#0 SYSTIMER_lRemoveTimerList (tbl_index=2) at ../Dave/Generated/SYSTIMER/systimer.c:245
object_ptr = 0x20003d14
#1 0x08027f18 in SYSTIMER_StopTimer (id=3) at ../Dave/Generated/SYSTIMER/systimer.c:503
status = SYSTIMER_STATUS_SUCCESS
#2 0x0802b5c6 in SDMMC_BLOCK_SD_lStopTimer (obj=0x200001a8 ) at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:2077
tmr_id = 3
status = 168099
#3 0x0802aa60 in SDMMC_BLOCK_SD_lSingleBlockTransfer (obj=0x200001a8 , cmd=0x8037426 , arg=56771, block_size=512, buf=0x2000414c , transfer_mode=SDMMC_BLOCK_SD_DATA_TRANSFER_WRITE_BUFFER) at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:1291
status = SDMMC_BLOCK_MODE_STATUS_SUCCESS
qbytes = 128
#4 0x0802ba4a in SDMMC_BLOCK_SD_lCardWriteSingleBlock (obj=0x200001a8 , write_buf=0x2000414c , write_addr=56771) at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:2391
status = SDMMC_BLOCK_MODE_STATUS_SUCCESS
sector_count = 1
#5 0x0802c374 in SDMMC_BLOCK_SD_WriteBlock (obj=0x200001a8 , write_buf=0x2000414c "5655800;6991;999;0.000031;351;999;-0.000836;4093130;999;-0.001110;-0.000000;118.630000\r\n7157175;715717500;6994;999;0.000075;320;999;-0.000867;4093019;999;-0.001801;-0.000000;118.640000\r\n7157774;715777"..., sector_num=56771, sector_count=1 '\001') at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block_private_sd.c:2934
status = SDMMC_BLOCK_MODE_STATUS_CARD_IS_LOCKED
disk_status = 64 '@'
res = SDMMC_BLOCK_STATUS_SUCCESS
#6 0x080290a2 in SDMMC_BLOCK_WriteBlock (obj=0x200001a8 , write_buf=0x2000414c "5655800;6991;999;0.000031;351;999;-0.000836;4093130;999;-0.001110;-0.000000;118.630000\r\n7157175;715717500;6994;999;0.000075;320;999;-0.000867;4093019;999;-0.001801;-0.000000;118.640000\r\n7157774;715777"..., sector_number=56771, sector_count=1 '\001') at ../Dave/Generated/SDMMC_BLOCK/sdmmc_block.c:250
status = SDMMC_BLOCK_STATUS_SUCCESS
#7 0x0802fd3c in disk_write (drive=0 '\000', buffer=0x2000414c "5655800;6991;999;0.000031;351;999;-0.000836;4093130;999;-0.001110;-0.000000;118.630000\r\n7157175;715717500;6994;999;0.000075;320;999;-0.000867;4093019;999;-0.001801;-0.000000;118.640000\r\n7157774;715777"..., sectornumber=56771, sectorcount=1) at ../Dave/Generated/FATFS/fatfs.c:294
diskresult = 32
result = 268437180
#8 0x0802ed46 in f_write (fp=0x20004124 , buff=0x100006bc, btw=64, bw=0x1000073c) at ../Dave/Generated/FATFS/ff_Src/ff.c:2694
res = FR_OK
clst = 134423947
sect = 1064925179
wcnt = 34
cc = 213424158
wbuff = 0x100006de "1;383;999;-0.000805;4093197;999;-0.000693;-0.000000;118.680000\r\n"
csect = 4 '\004'
#9 0x080325ae in jm_integrate (state=0x20003e50 , time=7159564) at ../joulemeter.c:601
line = "7159564;715956400;6991;999;0.000031;383;999;-0.000805;4093197;999;-0.000693;-0.000000;118.680000\r\n\000 \000\000\000`(\a\000\020\060\a\000\020�\211\002\b\002\000\000\000�\000\000 \000 \a"
sz = 98
sz2 = 34
totalbw = 1148605
avg_volt = 3.1000802802813587e-005
avg_curr = -0.00080453280810810815
timespan = 597
timespan_secs = 0.059699999999999996
retj = {d = 118.68000000585268, u = {515807921, 1079880581}}
joules = -1.4889874270734333e-009
lmp_volt_avg = 6.9979979979979978
lmp_curr_avg = 0.3833833833833834
tli_curr_avg = 4097.2942942942946
tli_curr = -0.00069343868257831787
lmp_curr = -0.00080453280810810815
lmp_volt = 3.1000802802813587e-005
#10 0x08032ddc in main () at ../main.c:577
last_button_span = 7159563
timespan = 597
status = DAVE_STATUS_SUCCESS
vcom_st = USBD_VCOM_STATUS_DEVICE_UNATTACHED
usbwait = 0


Here is what it looks like in DAVE, you can see the additional NULL checks I've put in:

2185.attach

I am at a loss here - the chain of if/else statements in the SYSTIMER_lRemoveTimerList should already make sure that this branch is only entered when object_ptr->prev is valid. So that is what's making me think it's a reentrancy bug / data race where maybe an interrupt manipulates the timer table independently.

Colleagues are reporting to me that the fault has also happened in the SYSTIMER_StopTimer function.

I am using another SYSTIMER for something different:


static uint32_t tmr;
static uint8_t consume_pending;

void lcd_consume() {
if (consume_pending) {
uint32_t interval = 10000;
if (queue_first != queue_free) {
lcd_set();
queue_first = (queue_first + 1) % LCD_QLEN;

interval = queue[queue_first].ts + SYSTIMER_TICK_PERIOD_US;
}

consume_pending = 0;
SYSTIMER_RestartTimer(tmr, interval);
}
}

void lcd_callback(void* args) {
consume_pending = 1;
}

void lcd_start() {
tmr = SYSTIMER_CreateTimer(SYSTIMER_TICK_PERIOD_US, SYSTIMER_MODE_ONE_SHOT,
&lcd_callback,
NULL);
if (tmr) {
SYSTIMER_StartTimer(tmr);
} else {
while (1U) {
; // trap
}
}

lcd_callback(NULL);
}


As you can see, I am not manipulating the timer from its own callback, I only set a signalling variable and the timer manipulation (lcd_start / lcd_consume) is only ever called directly from my main() function so this shouldn't interfere.

Any ideas?
0 Likes
7 Replies
Not applicable
So I added a reentrancy check and it fired. Here is all of my instrumented SYSTIMER_lRemoveTimerList function:


static SYSTIMER_OBJECT_t *rtl_obj, *rtl_prev, *rtl_prevnext;

static uint8_t rtl_id = 0;

/*
* This function is called to remove a timer from the timer list.
*/
static void SYSTIMER_lRemoveTimerList(uint32_t tbl_index)
{
if (rtl_id != 0) {
__BKPT(0);
}
rtl_id = tbl_index + 1U;

SYSTIMER_OBJECT_t *object_ptr;
object_ptr = &g_timer_tbl[tbl_index];
/* Check whether only one timer available */
if ((NULL == object_ptr->prev) && (NULL == object_ptr->next ))
{
/* set timer list as NULL */
g_timer_list = NULL;
}
/* Check if the first item in timer list */
else if (NULL == object_ptr->prev)
{
/* Remove timer from list, and reset timer list */
g_timer_list = object_ptr->next;
g_timer_list->prev = NULL;
g_timer_list->count += object_ptr->count;
object_ptr->next = NULL;
}
/* Check if the last item in timer list */
else if (NULL == object_ptr->next)
{
rtl_obj = object_ptr;
if (object_ptr) {
rtl_prev = object_ptr->prev;
if (rtl_prev) {
rtl_prevnext = object_ptr->prev->next;
} else {
__BKPT(0);
}
} else {
__BKPT(0);
}

/* Remove timer from list */
object_ptr->prev->next = NULL;
object_ptr->prev = NULL;
}
else
{
/* Remove timer from list */
object_ptr->prev->next = object_ptr->next;
object_ptr->next->prev = object_ptr->prev;
object_ptr->next->count += object_ptr->count;
object_ptr->next = NULL;
object_ptr->prev = NULL;
}

rtl_id = 0;
}


The rtl_id variable is always set to 0 at the end of the function, it can only be non-zero inside the function, so the breakpoint at the start of the function hitting means that there is reentrancy...
0 Likes
Not applicable
Of course there is reentrancy, look what I found!


Program received signal SIGTRAP, Trace/breakpoint trap.
SYSTIMER_lRemoveTimerList (tbl_index=1) at ../Dave/Generated/SYSTIMER/systimer.c:222
222 __BKPT(0);
bt
#0 SYSTIMER_lRemoveTimerList (tbl_index=1) at ../Dave/Generated/SYSTIMER/systimer.c:222
#1 0x08027c0e in SYSTIMER_lTimerHandler () at ../Dave/Generated/SYSTIMER/systimer.c:298
#2 0x08027cb2 in SysTick_Handler () at ../Dave/Generated/SYSTIMER/systimer.c:343
#3 0xffffffe8 in ?? ()


So timers are removed by the SysTick_Handler / SYSTIMER_lTimerHandler under various conditions. I have a feeling there may be a race condition hiding here.

EDIT: After more inspection and thinking about it, I think the only fix for this is to leave timer list maintenance to the SysTick_Handler ISR, like timer deletion already is. The SYSTIMER_StopTimer function should only mark timers as inactive and not call SYSTIMER_lRemoveTimerList by itself.
An explicit mark may not needed, the SYSTIMER_STATE_STOPPED may be sufficient to let the ISR handle processing of the timer list correctly.
0 Likes
Not applicable
I have now implemented a quick-and-dirty fix in the SYSTIMER_StopTimer function that just disables the SysTick interrupt, see the __set_BASEPRI calls around the SYSTIMER_lRemoveTimerList call.


/*
* API to stop the software timer.
*/
SYSTIMER_STATUS_t SYSTIMER_StopTimer(uint32_t id)
{
SYSTIMER_STATUS_t status;
status = SYSTIMER_STATUS_SUCCESS;

XMC_ASSERT("SYSTIMER_StopTimer: Failure in timer restart operation due to invalid timer ID",
((id <= SYSTIMER_CFG_MAX_TMR) && (id > 0U)));
XMC_ASSERT("SYSTIMER_StopTimer: Error during stop of software timer", (0U != (g_timer_tracker & (1U << (id - 1U)))));

if (SYSTIMER_STATE_NOT_INITIALIZED == g_timer_tbl[id - 1U].state)
{
status = SYSTIMER_STATUS_FAILURE;
}
else
{
/* Check whether Timer is in Stop state */
if (SYSTIMER_STATE_RUNNING == g_timer_tbl[id - 1U].state)
{
/* set BASEPRI to disable SYSTIMER */
__set_BASEPRI(SYSTIMER_PRIORITY << (8 - __NVIC_PRIO_BITS));
/* remove Timer from node list */
SYSTIMER_lRemoveTimerList(id - 1U);
/* reenable */
__set_BASEPRI(0U);
/* Set timer status as SYSTIMER_STATE_STOPPED */
g_timer_tbl[id - 1U].state = SYSTIMER_STATE_STOPPED;
}
}

return (status);
}


I have been running with this for ten minutes without encountering a fault, but of course since the faults are so sporadic that doesn't say much.

One drawback of this approach is that it also disables all interrupts with identical or lower priority. It also clobbers any previous setting of BASEPRI (BASEPRI should maybe be saved and restored).

This solution was inspired by http://embeddedgurus.com/state-space/2014/02/cutting-through-the-confusion-with-arm-cortex-m-interru... .
0 Likes
Not applicable
Bumping this thread because I believe there's enough information in it to get an official comment or get it passed on to the team responsible for XMCLib...
0 Likes
jferreira
Employee
Employee
10 sign-ins 5 sign-ins First like received
Hi lerlacher,

Thanks for reporting the issue.
We will further analyze it and a new version of the APP will be available end of July.

Regards,
Jesus
0 Likes
Not applicable
Thank you!
0 Likes
User19827
Level 1
Level 1
jferreira wrote:
Hi lerlacher,

Thanks for reporting the issue.
We will further analyze it and a new version of the APP will be available end of July.

Regards,
Jesus


Just want to confirm that if this issue has been fixed in current release package.


I checked with the latest v4.1.16 version SYSTIMER implementation. I have no idea why ENTER_CRITICAL protection is used in xxxFromISR() functions but not normal xxx() functions.
From my understanding, in order to protect sharing data modified by interrupt routines, we should add critical_section_enter() in background programs.
But SYSTIMER app is implemented in an opposite way. It adds protections in all xxxFromISR() functions which are for calling in ISR programs.

Could anyone explain this?

// this function is copied here for discussion

SYSTIMER_STATUS_t SYSTIMER_StopTimerFromISR(uint32_t id)
{
SYSTIMER_STATUS_t status;

uint32_t ics;
ics = critical_section_enter();

status = SYSTIMER_StopTimer(id);

critical_section_exit(ics);

return (status);
}

///////////////////////////////////
0 Likes