OTA1 fails to successfully load new image in WICED 4.0

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Anonymous
Not applicable

I have been trying to get an application that has been upgraded from WICED 5.5.1 to WICED 4.0 to successfully execute an Over The Air(OTA) update. The OTA update worked in WICED 3.5.1, but not in WICED 4.0. In trying to resolve the issue I noticed invalid values for the variable CURRENT_SECTOR were being generated in /tools/makefiles/wiced_apps.mk. I was able to fix this bug in WICED 4.0, but OTA still fails to work.

We are using a custom board with an Inventek ISM43341_M4G_L44 module and 4MB of serial flash.

The partial fix: The CURRENT_SECTOR variable has a value that increases for no apparent reason. Changing the name of the variable from CURRENT_SECTOR to NEXT_SECTOR fixed the problem.

In /tools/makefiles/wiced_apps.mk I added info commands to print out several values line by line as the makefile was executed. Here is output showing that CURRENT_SECTOR increased by 7 between $(info 2..) and $(info 3..). The one line between these two info statements calls a perl script that does not make any changes to CURRENT_SECTOR.


<<< OUTPUT >>>

Downloading Bootloader ...
0: NAME: FR_APP, MAX_SIZE: 224 , CURRENT_SECTOR: 16, NEXT: 240
1: NAME: FR_APP, MAX_SIZE: 224 , CURRENT_SECTOR: 16, NEXT: 240
2: NAME: FR_APP, MAX_SIZE: 224 , CURRENT_SECTOR: 16, NEXT: 240
3: NAME: FR_APP, MAX_SIZE: 224 , CURRENT_SECTOR: 23, NEXT: 247
4: NAME: FR_APP, MAX_SIZE: 224 , CURRENT_SECTOR: 23, NEXT: 247
FR_APP START: 16 ADR: 65536 COUNT: 224 NEXT: 247 MAX SIZE: 224 

<<< CODE SNIPPET >>>

###############################################################################
# MACRO: BUILD_APPS_RULES
# Calculate sizes and locations for resources that will be stored in serial flash.
# $(1) is the name of a resource.
# $(2) OPTIONAL is the Max Size in 4K Sectors that will be allowed for this resource in sflash.
define BUILD_APPS_RULES
$(info 0: NAME: $(1), MAX_SIZE: $(2), CURRENT_SECTOR: $(CURRENT_SECTOR), NEXT: $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) $(CURRENT_SECTOR) $(APP_SECTOR_SIZE) $(2)) )
$(if $($(1)),$(eval $(1)_ENTRY_COUNT := 1),$(eval $(1)_ENTRY_COUNT := 0))
$(info 1: NAME: $(1), MAX_SIZE: $(2), CURRENT_SECTOR: $(CURRENT_SECTOR), NEXT: $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) $(CURRENT_SECTOR) $(APP_SECTOR_SIZE) $(2)) )
$(if $($(1)),$(eval $(1)_SECTOR_START := $(CURRENT_SECTOR)),$(eval $(1)_SECTOR_START := 0))
$(info 2: NAME: $(1), MAX_SIZE: $(2), CURRENT_SECTOR: $(CURRENT_SECTOR), NEXT: $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) $(CURRENT_SECTOR) $(APP_SECTOR_SIZE) $(2)) )
$(if $($(1)),$(eval $(1)_SECTOR_COUNT := $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) 0 $(APP_SECTOR_SIZE) $(2))),$(eval $(1)_SECTOR_COUNT := 0))
$(info 3: NAME: $(1), MAX_SIZE: $(2), CURRENT_SECTOR: $(CURRENT_SECTOR), NEXT: $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) $(CURRENT_SECTOR) $(APP_SECTOR_SIZE) $(2)) )
$(if $($(1)),$(eval $(1)_SECTOR_ADDRESS := $(shell $(PERL) $(SECTOR_ADDRESS_SCRIPT) $($(1)_SECTOR_START) $(APP_SECTOR_SIZE))),)
$(info 4: NAME: $(1), MAX_SIZE: $(2), CURRENT_SECTOR: $(CURRENT_SECTOR), NEXT: $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) $(CURRENT_SECTOR) $(APP_SECTOR_SIZE) $(2)) )
$(if $($(1)),$(eval CURRENT_SECTOR := $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $($(1)) $(CURRENT_SECTOR) $(APP_SECTOR_SIZE) $(2))),)
$(if $($(1)),$(eval $(1)_IS_SECURE := $(if $($(1)_SECURE), 1, 0)),    $(eval $(1)_IS_SECURE := 0))
$(info $(1) START: $($(1)_SECTOR_START) ADR: $($(1)_SECTOR_ADDRESS) COUNT: $($(1)_SECTOR_COUNT) NEXT: $(CURRENT_SECTOR) MAX SIZE: $(2))
$(eval APPS_HEADER_DEFINES += -D$(1)_ENTRY_COUNT=$($(1)_ENTRY_COUNT))
$(eval APPS_HEADER_DEFINES += -D$(1)_SECTOR_START=$($(1)_SECTOR_START))
$(eval APPS_HEADER_DEFINES += -D$(1)_SECTOR_COUNT=$($(1)_SECTOR_COUNT))
$(eval APPS_HEADER_DEFINES += -D$(1)_IS_SECURE=$($(1)_IS_SECURE))
endef
### end of BUILD_APPS_RULES

If anyone has had any luck getting OTA to work on WICED 4.0 I would appreciate some help. In our application we download and save the firmware image successfully, but when the device reboots instead of booting up with the new image, the device is bricked. That still happens after fixing the above issue.

0 Likes
1 Solution
Anonymous
Not applicable

In our code, it is a little more complicated than normal, because WICED assumes a 4KB sector size in the serial flash. We started using serial flash chips that have a 64KB sector size which is much faster for erasing and writing than the chips with the old 4KB sectors. It also makes our code more complicated since our code supports devices with both parts. I hard coded The initial CURRENT_SECTOR := 16#4KB sectors == 1 64KB sector.

All of our maximum file sizes are specified in multiples of 16(4KB sectors) to ensure that the files are stored on 64KB boundaries. The line that darius1 quotes is in the OTA2 section. We aren't using OTA2.

$(eval APP_LUT_SECTOR_COUNT = $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $(subst \,/,$(FINAL_APPS_FILE)) 0 4096) )

The crazy thing is that $(APP_LUT_SECTOR_COUNT) isn't even defined for OTA1. It doesn't actually matter, because the sflash write command only takes the name of the file to copy and the address in sflash from which to start writing. $(APP_LUT_SECTOR_COUNT) is only used to display the file sizes as you build the system.

Our system works in WICED 3.5.1 which precedes the addition of the CRC. When we upgraded to WICED 4.0.10, OTA1 stopped working. Current analysis suggests that it is writing to the DCT that is causing the failure. Unfortunately, Cypress made the mistake of using buffered reads from the internal flash when the DCT is in internal flash. This is going to be unacceptable to most of their customers who have devices with the DCT stored in internal flash. We are one of those customers. We are stuck in a situation where we can either go back to using WICED 3.5.1 until Cypress gets their act together or we can use WICED 4.0 and eliminate the buffered reads ourselves

View solution in original post

0 Likes
13 Replies
Anonymous
Not applicable

I found another bug. The assignment that specifies the size of the LUT is commented out on line 250 of /tools/makefiles/wiced_apps.mk. You [Cypress] commented out the 3rd line of the following selection.

$(FINAL_APPS_FILE): $(STRIPPED_LINK_APPS_FILE)

    $(QUIET)$(OBJCOPY) $(call FINAL_OUTPUT_OPTIONS,$<,$@)

#    $(EVAL APP_LUT_SECTOR_COUNT = $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $(subst \,/,$(FINAL_APPS_FILE)) 0 4096))

There is nowhere else that this size is specified for an OTA1 update, so when you try to write the LUT to the sflash you always try to write 0 sectors.

I fixed this bug by moving the assignment out of the rule and assigning EVAL APP_LUT_SECTOR_COUNT a value of 1 sector. There is no way the LUT will ever be bigger than 1 sector, so we don't need to do anything complicated to figure out how many sectors it will be.

While this fixed the bug, OTA is still just as badly broken.

This is not the correct answer.

0 Likes
Anonymous
Not applicable

Possible problem in second parameter of macro BUILD_APPS_RULES max size ...

Need change it to your sflash size  4MB / 4096 - 16 = 4*1024K/4K - 16= 1024 sectors -16 .

Or simple remove it..

Anonymous
Not applicable

Note:

$(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $(subst \,/,$(FINAL_APPS_FILE)) 0 4096))   compute incorrectly value, because  perl script allways increment sector count result .

So  $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $(subst \,/,$(FINAL_APPS_FILE)) 0 4096))  =2, but apps.bin file size 1sector.  

0 Likes
Anonymous
Not applicable

Hi,

CURRENT_SECTOR is  using to compute  sectors offsets and sizes of files, which are stored in serial flash(dct, FACTORY RESTORE, ...).  It generate defines. This defines are included at  apps.bin compiled time. This apps.bin with size 4096 is saved to serial flash.  So main application know where are files in  sflash.

Append VERBOSE=1 and you will see what it generate when compile apps.elf.

What your spi chip?  Is this chip supported by wiced sdk?

Anonymous
Not applicable

Regarding the max size, this is a customization I added a year ago to eliminate the inadequate file fragmentation logic that Broadcom wrote. All the max size does is change the expected file size to be the maximum possible size as defined in apps/custom_app/custom_app.mk and tools/makefiles/wiced_config.mk. What I do is allocate portions of the sflash to each of the 8 apps and specify that upfront. When tools/makefiles/wiced_apps.mk, calculates the space required for each of the 8 apps, instead of using actual file sizes it uses the max size. Thus the initial allocated size is the maximum size. The max size is what is used by the sflash write logic and stored in place of actual file sizes in the Look Up Table (LUT).

What makes this simplified approach work is that Broadcom's file fragmentation logic does not allow for files to shrink. If you save an app that requires fewer sectors than were originally required. Broadcom does not shrink the allocated space for that file and rewrite the LUT. I use Broadcom's logic, but I trick it into allocating the maximum space for each app at the start, so that the allocation for each file never needs to be expanded. Thus the LUT never needs to be rewritten. If the power were to go out in the middle of an LUT rewrite, The LUT could become corrupted. In my simplified approach the LUT is never rewritten, so we never have to worry about that potential instability.

One final note: I have the same problem with OTA1 failing regardless of whether I have my simplified max file size logic added or not.

Anonymous
Not applicable

Hi.

Your idea like me. But you must guarantied that sectors are computed correctly.

Your factory restore working ok?

I was added crc32, which help me to find bugs.

0 Likes
Anonymous
Not applicable

In our code, it is a little more complicated than normal, because WICED assumes a 4KB sector size in the serial flash. We started using serial flash chips that have a 64KB sector size which is much faster for erasing and writing than the chips with the old 4KB sectors. It also makes our code more complicated since our code supports devices with both parts. I hard coded The initial CURRENT_SECTOR := 16#4KB sectors == 1 64KB sector.

All of our maximum file sizes are specified in multiples of 16(4KB sectors) to ensure that the files are stored on 64KB boundaries. The line that darius1 quotes is in the OTA2 section. We aren't using OTA2.

$(eval APP_LUT_SECTOR_COUNT = $(shell $(PERL) $(SECTOR_COUNT_SCRIPT) $(subst \,/,$(FINAL_APPS_FILE)) 0 4096) )

The crazy thing is that $(APP_LUT_SECTOR_COUNT) isn't even defined for OTA1. It doesn't actually matter, because the sflash write command only takes the name of the file to copy and the address in sflash from which to start writing. $(APP_LUT_SECTOR_COUNT) is only used to display the file sizes as you build the system.

Our system works in WICED 3.5.1 which precedes the addition of the CRC. When we upgraded to WICED 4.0.10, OTA1 stopped working. Current analysis suggests that it is writing to the DCT that is causing the failure. Unfortunately, Cypress made the mistake of using buffered reads from the internal flash when the DCT is in internal flash. This is going to be unacceptable to most of their customers who have devices with the DCT stored in internal flash. We are one of those customers. We are stuck in a situation where we can either go back to using WICED 3.5.1 until Cypress gets their act together or we can use WICED 4.0 and eliminate the buffered reads ourselves

0 Likes
Anonymous
Not applicable

Hi,

What the bug you found ?

0 Likes
Anonymous
Not applicable

Unfortunately, Cypress made the mistake of using buffered reads from the internal flash when the DCT is in internal flash. This is a catastrophically bad design decision. When using internal flash values can be loaded directly from flash into the MCU without storing them in RAM. The security section of the DCT is a little over 6KB in size. Cypress allocates blocks of RAM that are as big as the section of the DCT that you are trying to read. So in some cases they will allocate as much as 6KB, just to read the DCT. In some cases Cypress allocates these unnecessary buffers on the stack and in some cases on the heap. In a memory constrained applications allocating 6KB on either the stack or heap is likely to result in running out of memory which might crash the app. Most users that have systems with the DCT stored in internal flash would consider this unacceptable. Thus WICED 4.0 is inappropriate for such users. My recommendation to my boss is that we choose not to upgrade our production systems to WICED 4.0 until Cypress has fixed this catastrophically bad design decision. An alternative would be to fix the design flaw ourselves, but at the moment we have too much other work to take that route.

webmstreric wrote: (extracted)

Cypress made the mistake of using buffered reads from the internal flash when the DCT is in internal flash.

When using internal flash values can be loaded directly from flash into the MCU without storing them in RAM.

Cypress allocates blocks of RAM that are as big as the section of the DCT that you are trying to read.

In some cases Cypress allocates these unnecessary buffers on the stack and in some cases on the heap.

Hi, webmstreric

I don't quite understand the above 4 (extracted) statements. Considering 4.0.1 DCT read API: (As I remember, the above code is the same since SDK 3.3.1.)

wiced_result_t wiced_dct_read_lock( void** info_ptr, wiced_bool_t ptr_is_writable, dct_section_t section, uint32_t offset, uint32_t size )

{

#ifdef EXTERNAL_DCT

    UNUSED_PARAMETER( ptr_is_writable );

    *info_ptr = malloc_named( "DCT", size );

    return wiced_dct_read_with_copy( *info_ptr, section, offset, size);

#else /* ifdef EXTERNAL_DCT */

    if ( ptr_is_writable == WICED_TRUE )

    {

        *info_ptr = (void*)malloc_named( "DCT", size );

        if ( *info_ptr == NULL )

        {

            return WICED_ERROR;

        }

        wiced_dct_read_with_copy( *info_ptr, section, offset, size );

    }

    else

    {

        *info_ptr = (char*)wiced_dct_get_current_address( section ) + offset;

    }

    return WICED_SUCCESS;

#endif /* ifdef EXTERNAL_DCT */

}

1. When accessing internal DCT only for "read", this API simply setup the output pointer "info_ptr" without allocate any extra memory.

2. Even for internal DCT "access for write" cases, the the output pointer "info_ptr" is setup with "malloc" and should always use heap.

My questions are : Am I thinking too naively that there are more details behind the scene here? Or do I misunderstands your statements?

Anonymous
Not applicable

wiced_dct_read_lock() is a bad function because it uses malloc except when dealing with reading from internal flash. I don't use that function anymore because of the hidden malloc. I consider that function both dangerous because of the malloc and unnecessary because the one case where it does not use a malloc it boils down to a single line of code:

*info_ptr = (char*)wiced_dct_get_current_address( section ) + offset;

I just use wiced_dct_get_current_address() and if I need a small write buffer I allocate one on the stack as a local variable. The disadvantage of my approach is that as soon as any change is made to the DCT, the pointer returned by a prior call to wiced_dct_get_current_address() will no longer be valid, because the entire DCT will be in a different location. I feel this is less dangerous than running the risk of a memory leak from using malloc in the read_(lock/unlock) functions.

The DCT is designed to occupy one of 2 same sized blocks of flash memory. Any time a change is made to any value in the DCT the change is copied into the currently unused block and all the unchanged portions of the DCT are also copied over. Then that block becomes the DCT and the old block is invalidated. In WICED 4.0 the code for copying the unchanged portions of the DCT reads entire sections of the DCT into a large block of RAM and then copies that RAM into the new DCT. In older versions of WICED it didn't do that If you were using a DCT in internal flash. It would use a pointer to the current DCT and read as directly as possible into the commands that wrote the new DCT. It didn't need to create a 6KB block of RAM to get the job done.

Most Cypress customers are going to consider unnecessarily allocating large blocks of memory unacceptable. Some will not mind. I think Cypress will fix this issue quickly, because they want customers to use their hottest new product.

webmstreric wrote:

In WICED 4.0 the code for copying the unchanged portions of the DCT reads entire sections of the DCT into a large block of RAM and then copies that RAM into the new DCT. In older versions of WICED it didn't do that If you were using a DCT in internal flash. It would use a pointer to the current DCT and read as directly as possible into the commands that wrote the new DCT. It didn't need to create a 6KB block of RAM to get the job done.

Hi webmstreric

I agree with your points "directly using current DCT pointer w/ offset and avoid malloc" and "allocate the smallest necessary local variable (and use <wiced_dct_write>, I guess)" are better in many cases.

But I have questions again about your statements above. I assume we're talking about <wiced_dct_write>.

If I understand the code correctly the most significant place where *large* block of RAM is to be allocated in heap is inside the (almost inevitable) API <wiced_dct_get_current_address>, where <wiced_dct_internal_dct_update> is called.

     allocated_src_buff = (uint8_t*)calloc(LARGEST_DCT_SUB_STRUCTURE_SIZE, 1);

     allocated_dst_buff = (uint8_t*)calloc(LARGEST_DCT_SUB_STRUCTURE_SIZE, 1);

With default definition of CERTIFICATE_SIZE, up to 8 kB RAM is acquired from heap.

I don't think this is going to generate big problems, if developers simply (trickily) calls <wiced_dct_get_current_address> once at the entry of <application_start>. In the worse case where "if (dctx_sdk_version != DCT_BOOTLOADER_SDK_CURRENT)" is true, <wiced_dct_get_current_address> should still run successfully at this moment since there should be enough RAM for those <calloc>. And <wiced_dct_update_version_to_current> should do its job on latest DCT, while the other DCT is erased. From now on the worst case should not come up any more in our apps.

Any comments?

Anonymous
Not applicable

There are 2 copies of the DCT and every time you write anything to the DCT the "current" copy of the DCT switches from one copy to the other. Calling <wiced_dct_get_current_address> once would be inadequate unless you never write anything to the DCT. Unfortunately the OTA process writes to the DCT to communicate with the bootloader, so you can't avoid writing to the DCT if you use OTA. Making the decision to simply not write anything to the DCT also causes software maintainability issues. You can't guarantee that some other developer coming after you will follow the same rule.

I think Cypress is having to deal with poor design decisions that were made by Broadcom when Broadcom tried to create a universal API for accessing the DCT that was the same for devices that store the DCT in internal flash and devices that store the DCT in external flash. To read from internal flash all you need is a pointer to the start of the block you want to read from. When reading from external flash you need to copy the values into a block of RAM. Consequently, Broadcom developed the wiced_dct_read_[lock/unlock]() functions which use malloc(). These functions are fairly easy to avoid using. I don't believe that Broadcom's lower level code allocated anything on the heap when the DCT was in internal flash. I am not all that familiar with the code dealing with a DCT in external flash. In WICED 4.0 Cypress has continued with the idea of using a Universal API and they are allocating large blocks of memory on either the stack or heap to read data regardless of whether the DCT is in internal or external flash.

Generally in embedded systems if you need a large block of data you pass a pointer to it into the function that uses it or access a globally defined block. If you only need a small amount of data, you can allocate that on the stack. You try to avoid using the heap. Cypress could make the decision to change the API, so that it is no longer universal and when you use a DCT in external flash you pass a pointer to a large block of RAM into the functions that access the DCT. When you use a DCT in internal flash you retrieve a pointer to the DCT block that you want. Cypress could also decide to continue using a universal API and only use a small block of memory that can easily be allocated on the stack for copying unchanged data from the current copy of the DCT to the new copy of the DCT. Cypress is likely to figure out something fairly quickly on this issue because they don't want their customers to be afraid that writing to the DCT or using OTA will compromise software stability.