sflash_write frequently fails on builds using download_apps

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Anonymous
Not applicable

I found that apps/waf/sflash_write frequently fails when building using download_apps on WICED 5.0.0. Unfortunately no error is displayed on the console when this happens, so you would have to look at the build/openocd_log.txt file to see that the file was not completely written. Oddly I noticed that if a system image file had been successfully written to the exact same location previously and hadn't been changed, the file was still bootable when an error occurred that aborted the sflash write. I ended up erasing the the sflash chip after each test and then the system image was not bootable if the sflash write was aborted on the next test. This meant the sflash was not being corrupted, it just wasn't being completely written.

In order to see that the sflash write operation failed on the console, I made changes to the apps/waf/sflash_write/sflash_write.tcl file and to the tools/makefiles/wiced_apps.mk file. I would have liked to have the build process abort displaying an error message when the TCL script failed. The script was already executing exit -1 commands when it aborted, which in my opinion should have caused the build to fail. I couldn't figure out how to get the build to fail on my Windows 10/WICED 5.0 setup. Instead I got the TCL script to display an error message on the console. When wiced_apps.mk runs the TCL script both stdout and stderr are redirected to the log file, so none of the output shows on the console. The TCL script calls the halt command a lot and it sends output to stderr every time. I couldn't figure out how to redirect that output to stdout, so I redirected all output from the TCL script to stderr except for output regarding an error that aborts the program. I used stdout for that. This is obviously backwards. In wiced_apps.mk I only redirect output stderr output from each call of the TCL script to the log file and output to stdout will show up on the console.

In sflash_write.tcl:

I changed all puts "..." calls to puts stderr "..."

Then before each call to exit -1 I added a puts "Sflash write failed! Please try again!"

In wiced_apps.mk:

I changed each of the rules that calls the TCL script so that only stderr(2) was redirected into the log file.

With,

DOWNLOAD_LOG := >> $(OPENOCD_LOG_FILE)

I changed,

$(DOWNLOAD_LOG) 2>&1

into,

2$(DOWNLOAD_LOG)

Example:

FR_APP_DOWNLOAD:  $(FR_APP_DOWNLOAD_DEPENDENCY)...

     $(call CONV_SLASHES,$(OPENOCD_FULL_NAME)) ... -c "sflash_write_file $(FR_APP)..." -c shutdown $(DOWNLOAD_LOG) 2>&1

became,

FR_APP_DOWNLOAD:  $(FR_APP_DOWNLOAD_DEPENDENCY)...

     $(call CONV_SLASHES,$(OPENOCD_FULL_NAME)) ... -c "sflash_write_file $(FR_APP)..." -c shutdown 2$(DOWNLOAD_LOG)

The ultimate result of carefully examining the communication between the TCL script and the sflash_write.c app that the script loads and runs in RAM was that a minor change could fix most of the problem. Both the app and the TCL script share access to the same block of RAM that is used for passing information back and forth. In sflash_write.c this is called the data_transfer struct. The struct contains a 16KB data array and meta data describing where and how much of the array should be saved to the sflash. All I had to do was add the volatile keyword to the definition of each meta data data_transfer struct attribute in order to fix most of this problem.

In apps/waf/sflash_write/sflash_write.c:

I replaced,

typedef struct

{

    unsigned long size;

    unsigned long sflash_address;

    unsigned long command;

    mfg_spi_flash_result_t result;

    unsigned char          data[__JTAG_FLASH_WRITER_DATA_BUFFER_SIZE__];

} data_transfer_area_t;

with,

typedef struct

{

    volatile unsigned long size;

    volatile unsigned long sflash_address;

    volatile unsigned long command;

    volatile unsigned long result;

    unsigned char          data[__JTAG_FLASH_WRITER_DATA_BUFFER_SIZE__];

} data_transfer_area_t;

The device I was originally working on had a custom Murata MCU and a Winbond sflash chip. Since it was unlikely Cypress or anyone else was likely to get one of those boards I reproduced the problem on an Inventek ISMART Arduino Shield (ISM43340) with a Macronix sflash chip. The change above completely fixed the problem on the ISMART board, but when I went back to our custom board I still saw sflash errors on rare occasions although it seemed to be a lot less frequently.

Does anyone have any ideas I might try on our original configuration that might get the problem to completely go away?

0 Likes
1 Solution
Anonymous
Not applicable

I fixed the problem with occasional 0xFF bytes showing up after the sflash was written. What I did was rewrite the apps/waf/sflash_write/sflash_write.tcl script. I discarded what I thought was unnecessary code and tried to optimize the process so that it would run faster. I was able to more than double the speed and the random 0xFF bytes stopped appearing. I can't say exactly what caused the random OxFF bytes because I made so many changes. My guess is that it was unnecessary halt commands in the TCL script that would have interrupted the sflash write process while the TCL script checked to see if the c program had finished writing a 16KB block to sflash. I found that it was not necessary to halt the processor just to check a value in RAM and the script runs a lot faster without the halt commands. When the TCL script starts the c program a halt is required so that the TCL script can set the program counter, a register named PC, but I removed all the other halt commands.

Other changes that I made included the following:

1) In my opinion the rudimentary file system that WICED uses to add extra sectors to a file when the file is updated with a larger version of the file, is not suitable for production, because it only allows for 8 fragments per file and after the last fragment is used, it can't make the file any bigger and will fail. What I do instead, is figure out what the maximum size for each file is at the start. Then if I ever try to exceed that size it will fail on my developer's desk the same way it would fail in production.

2) Our customer had code that already wrote in sector 0 where the file fragmentation Look Up Table(LUT) is supposed to be located on this device. Since I only use one fragment for each file, I don't actually need the LUT. I modified the code so that the file locations and sizes are stored in the DCT header and eliminated the LUT. Note: this modifies the way the bootloader will load a file, so it can only be done on systems that have not been deployed to production, unless you are willing to replace the bootloader on all production devices.

3) I added the "volatile" keyword to apps/waf/sflash_write/sflash_write.c as discussed in the comment from 2 weeks ago above.

4) I rewrote the apps/waf/sflash_write/sflash_write.tcl to remove halts and optimize it. This included replacing the sflash_write_file command with a sflash_write_multiple_files command that allowed me to download multiple files to sflash using a single openocd command in the makefile.

5) I modified tools/makefiles/wiced_apps.mk to consolidate all the make targets for each app into a single target that executed a single openocd command to write all the files to sflash. I also removed writing of the LUT.

6) I modified tools/makefiles/wiced_elf.mk to make the dct dependent on getting the file locations and sizes from wiced_apps.mk.

After doing all of that I discovered that I wasn't seeing the random 0xFF bytes show up after an sflash_write. I had added a work around to sflash_write.c that would rewrite a sector if it failed to pass validation. I was able to remove that modification, because I didn't need it anymore.

View solution in original post

5 Replies
DaBa_2244756
Level 5
Level 5
25 likes received 10 likes received 10 likes given

Hi,

For me download_apps was unstable too. But I using sdk 3.7.0 sam4s.

It seem to me, that cause download_apps unstability was multiple spi interface init...deinit in sflash_write.c

Try do init once time and disable deinit.

Maybe it help.

BR

Darius

0 Likes
Anonymous
Not applicable

In WICED 5.0.0 apps/waf/sflash_write/sflash_write.c has a main while ( 1 == 1 ) loop which processes each command sent by the TCL script to write a 16KB block to sflash. Near the beginning of the loop after getting the command, init_sflash() is executed and at the end of the loop deinit_sflash() is executed. For each command sent, the sflash gets initialized and deinitialized only once. That looks OK to me, but moving the init_sflash in front of the while ( 1==1 ) and getting rid of the deinit_sflash would work just as well.

Using the "volatile" keyword that I mention above fixed the problem on the Inventek board, so that is the most critical problem that would effect most users. When you have two programs running on the system and communicating through shared memory like this, "volatile" tells the compiler that the actual memory location needs to be accessed every time you retrieve that value instead of looking at a cached value in a register on the MCU. If the MCU just looks at a cached value it may not detect a change made by the other program. "Volatile" should have been used on this shared memory from the start. It was a big oversight on the part of Broadcom/Cypress that they didn't put that in.

On our custom board we are still having problems with the sflash write verification process reporting that the sflash was not written properly. Every several hundred times the sflash_write() function in the driver gets called to write a full 4KB sector, one of the bytes somewhere in the middle does not get written. It retains the 0xFF value it had after the sector was erased. The bytes before and after this byte are all written correctly. When you have a system image that is several hundred sectors long, every few times you try to write the image, one of the sectors will have this problem. So far the only solution I have come up with is to erase and rewrite the sector containing the 0xFF byte. This isn't a fix. This is a workaround, but as I said before this problem does not happen on every board and appears to be device specific.

On our development board we have a Winbond W25Q32FV chip which was recently obsoleted by the manufacturer. We are never going to put that particular revision of the board into production, so for us this problem will hopefully go away on the next revision of the board which clearly is not going to have that obsolete chip on it.

0 Likes

Yes you right about volatile. If used release mode (optimization on) it must be, because compiler not know, that jtag  change memory.

Try your make string (download_apss) with -debug  mode. If then working stable, then need check all code, to insert volatile, where need.

I used debug mode, so for me not was "volatile" problems.

BR

Darius

0 Likes
Anonymous
Not applicable

I fixed the problem with occasional 0xFF bytes showing up after the sflash was written. What I did was rewrite the apps/waf/sflash_write/sflash_write.tcl script. I discarded what I thought was unnecessary code and tried to optimize the process so that it would run faster. I was able to more than double the speed and the random 0xFF bytes stopped appearing. I can't say exactly what caused the random OxFF bytes because I made so many changes. My guess is that it was unnecessary halt commands in the TCL script that would have interrupted the sflash write process while the TCL script checked to see if the c program had finished writing a 16KB block to sflash. I found that it was not necessary to halt the processor just to check a value in RAM and the script runs a lot faster without the halt commands. When the TCL script starts the c program a halt is required so that the TCL script can set the program counter, a register named PC, but I removed all the other halt commands.

Other changes that I made included the following:

1) In my opinion the rudimentary file system that WICED uses to add extra sectors to a file when the file is updated with a larger version of the file, is not suitable for production, because it only allows for 8 fragments per file and after the last fragment is used, it can't make the file any bigger and will fail. What I do instead, is figure out what the maximum size for each file is at the start. Then if I ever try to exceed that size it will fail on my developer's desk the same way it would fail in production.

2) Our customer had code that already wrote in sector 0 where the file fragmentation Look Up Table(LUT) is supposed to be located on this device. Since I only use one fragment for each file, I don't actually need the LUT. I modified the code so that the file locations and sizes are stored in the DCT header and eliminated the LUT. Note: this modifies the way the bootloader will load a file, so it can only be done on systems that have not been deployed to production, unless you are willing to replace the bootloader on all production devices.

3) I added the "volatile" keyword to apps/waf/sflash_write/sflash_write.c as discussed in the comment from 2 weeks ago above.

4) I rewrote the apps/waf/sflash_write/sflash_write.tcl to remove halts and optimize it. This included replacing the sflash_write_file command with a sflash_write_multiple_files command that allowed me to download multiple files to sflash using a single openocd command in the makefile.

5) I modified tools/makefiles/wiced_apps.mk to consolidate all the make targets for each app into a single target that executed a single openocd command to write all the files to sflash. I also removed writing of the LUT.

6) I modified tools/makefiles/wiced_elf.mk to make the dct dependent on getting the file locations and sizes from wiced_apps.mk.

After doing all of that I discovered that I wasn't seeing the random 0xFF bytes show up after an sflash_write. I had added a work around to sflash_write.c that would rewrite a sector if it failed to pass validation. I was able to remove that modification, because I didn't need it anymore.

sqlsql_2244756 wrote:

Hi,

For me download_apps was unstable too. But I using sdk 3.7.0 sam4s.

It seem to me, that cause download_apps unstability was multiple spi interface init...deinit in sflash_write.c

Try do init once time and disable deinit.

Maybe it help.

BR

Darius

I also experienced the download_apps unstable issue on both sdk-3.7.0 and sdk-5.1.0.

I don't find any cypress FAE involved in this discussion.

I'm wondering if cypress want to improve this or not.

0 Likes