Is there any *hardware* way to make PSoC5 code hang or crash ?

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
HuEl_264296
Level 5
Level 5
First like given 25 sign-ins First solution authored

Imagine I had some very simple code, like:

 

int main(void)
{
    CyGlobalIntEnable;

    for(;;)
    {
        LED_Write(1);
        LED_Write(0);
    }    
}

 

Is there anything external I could do to the PSoC to make that code hang? Could I make the chip fail some way where it stopped executing code, and just hung?

What if I heated the chip, or injected glitches into any of the power lines?  What if I briefly disconnected one or more of the GND pins?

If I wanted to deliberately cause the CPU to jump to CyHalt or CY_ISR(IntDefaultHandler) , what external thing could I do to the chip?

(I know that I can crash the code by calling a null function pointer for example. I am not looking for software ways to make the chip crash. I am looking for hardware actions that make the chip fail).

 

 

 

0 Likes
1 Solution
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

HuEl,

By the nature of the ARM core design, the multitude of manufacturer tests and "in-the-field" testing with >> 1 Billion units sold, there is no way to cause the core to "jump track" via HW if the HW requirements are met.

However if you violate the HW requirements it may be possible.

I'll give you an example:

Recently I worked on a project where the end product was a handheld device.  The CPU was driven by a crystal.

As part of the pre-release testing we were required by the customer to drop the device 1000 times at 3 meters above a concrete slab.   Most of the units under tests survived (with a lot of nicks and scratches).  However one unit stopped working.  We found that on this unit the EEPROM data used for configuration had become corrupted.

In short. the root cause of the issue was found after many weeks of focused testing.  The issue was that during the drop, two things had to occur at the same time.  The CPU had to be active and trying to store new EEPROM data at the time of the drop impact.

At the time of drop impact, the crystal, being a piezo-electric device, exhibited additional external mechanical stimulus that added to the output frequency.  This caused the frequency to shift.  It became temporarily lower (not a problem) and temporarily higher (a big problem). 

When the higher shift of the desired output frequency exceeded the maximum allowable CPU frequency, a HW requirement was violated.   This caused CPU over-clocking.  This resulted in insufficient time to read CPU resources and caused the instruction code or SRAM data to be read incorrectly.

During our extensive root cause isolation testing, we were finally able to record multiple occurrences of CPU over-clocking.   Over 90% of the occurrences just resulted in a bad instruction code read which only resulted eventually in a WatchDog reset (we implemented a diagnostic counter).  In essence, the "train went off the track".  In these cases, no EEPROM data was corrupted but the current operational cycle had not completed properly and the CPU reset.

Once we added special test code to increase the EEPROM write occurrences, we were able to increase the EEPROM corruption occurrences enough to detect that reading the pointer to the scratchpad SRAM that contained the new EEPROM contents had gotten corrupted due to the over-clocking.   Therefore the wrong data was written to the EEPROM and this verified the corruption we had seen.

Our solution:  Simple.  We ended up changing the code for the ARM CPU to be clocked by the less precise internal non-piezo clock source.  The crystal was still used for the precision operation that was needed on a peripheral.  However, in our application, the CPU itself did not need such precision.

Other ways to cause a HW disturbance

You can violate VDD voltage requirements.   VDD transients too high might cause a CPU 'jump' but most likely might damage the IC.   VDD transients too low might cause a CPU 'jump' (ie not enough digital voltage to read a proper instruction from FLASH).  However it is possible you might trip the POR mechanism and reset the CPU.

One quick way to create low VDD transients is to perform the "metal file and metal nail" test.  (Yes. I said a metal file and metal nail.  You know the hardware you can by at the hardware store)

The setup for this test is to wire the VDD power from the power source to the metal file using solder or a good alligator clip.  The VDD to the IC is soldered or clipped to a metal nail.  Note: GND is still connected to GND of the power supply.

To perform the test, press the nail to the file.  This will complete the VDD power circuit with relatively low resistance.  This should power up the CPU and start running.  Once running, drag the tip of the nail across the file surface.  This will cause very short durations of higher series resistance in the VDD circuit dropping the VDD voltage.   After many trials you will get mostly resets but some of them might be low-voltage induced improper FLASH or SRAM reads (or writes).  These might cause a CPU-lockup.  (Hence one of the main reasons for Watchdog protection.)

In past projects, I substituted SW and HW controlled VDD drops for testing.  I used a PSoC5 to source VDD to the DUT and cut the voltage.  While the voltage was dropping, I was monitoring it in HW with the PSoC5 and using a comparator that I set to a 'restart' threshold.  Once the threshold was triggered, I immediately reengaged the VDD to the DUT to prevent a POR.

This VDD transient control allowed me to find various issues with the SW such as where EEPROM was written incorrectly (EEPROM write voltage did not meet requirements).  If I remember correctly, I might have detected a few Watchdog events due to probable FLASH read voltage being too low for a correct read.

Sorry for being long-winded.  I hope this might be helpful for anyone reading this.

route VDD from the power supply through a metal file to VDD to the IC.   

Len
"Engineering is an Art. The Art of Compromise."

View solution in original post

0 Likes
3 Replies
Vasanth
Moderator
Moderator
Moderator
250 sign-ins 500 solutions authored First question asked

Hi,

Kindly check the following appnote. This shows all the steps before control reaches the main.c. After reset hardware startup needs to be completed, then only software startup should complete. As you are interested in hardware startup, you can concentrate on the first part. Once reset is released, the hardware-controlled portion of
startup begins. Hardware startup can be split into two phases: reset and boot. In both phases, the CPU is halted.
In the reset phase, the device is inactive, waiting for onchip resources to stabilize enough to enter the boot phase.
In the boot phase, a dedicated hardware state machine controls basic configuration and trim of the device using direct memory access (DMA). Executing the boot phase takes a fixed number of clock cycles. If you have any issue with clock and other system resources you can have issues at your end. 

What you can make sure first is whether the control reaches the main during startup. If not you can check the following appnote whether all hardware guidelines are met. This could help you in eliminating possible issues that could cause this behavior.

Best Regards,
Vasanth

0 Likes

Hi Vasanth,

Thanks for the reply. I am not interested in the startup process. I am interested in causing running code to crash.

Imagine that I have written the above code, and it's running on a PSoC right now. I can see the LED flashing (very fast).

Now that it's working, is there anything external I can do to the chip to cause that code to crash?

 

 

0 Likes
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

HuEl,

By the nature of the ARM core design, the multitude of manufacturer tests and "in-the-field" testing with >> 1 Billion units sold, there is no way to cause the core to "jump track" via HW if the HW requirements are met.

However if you violate the HW requirements it may be possible.

I'll give you an example:

Recently I worked on a project where the end product was a handheld device.  The CPU was driven by a crystal.

As part of the pre-release testing we were required by the customer to drop the device 1000 times at 3 meters above a concrete slab.   Most of the units under tests survived (with a lot of nicks and scratches).  However one unit stopped working.  We found that on this unit the EEPROM data used for configuration had become corrupted.

In short. the root cause of the issue was found after many weeks of focused testing.  The issue was that during the drop, two things had to occur at the same time.  The CPU had to be active and trying to store new EEPROM data at the time of the drop impact.

At the time of drop impact, the crystal, being a piezo-electric device, exhibited additional external mechanical stimulus that added to the output frequency.  This caused the frequency to shift.  It became temporarily lower (not a problem) and temporarily higher (a big problem). 

When the higher shift of the desired output frequency exceeded the maximum allowable CPU frequency, a HW requirement was violated.   This caused CPU over-clocking.  This resulted in insufficient time to read CPU resources and caused the instruction code or SRAM data to be read incorrectly.

During our extensive root cause isolation testing, we were finally able to record multiple occurrences of CPU over-clocking.   Over 90% of the occurrences just resulted in a bad instruction code read which only resulted eventually in a WatchDog reset (we implemented a diagnostic counter).  In essence, the "train went off the track".  In these cases, no EEPROM data was corrupted but the current operational cycle had not completed properly and the CPU reset.

Once we added special test code to increase the EEPROM write occurrences, we were able to increase the EEPROM corruption occurrences enough to detect that reading the pointer to the scratchpad SRAM that contained the new EEPROM contents had gotten corrupted due to the over-clocking.   Therefore the wrong data was written to the EEPROM and this verified the corruption we had seen.

Our solution:  Simple.  We ended up changing the code for the ARM CPU to be clocked by the less precise internal non-piezo clock source.  The crystal was still used for the precision operation that was needed on a peripheral.  However, in our application, the CPU itself did not need such precision.

Other ways to cause a HW disturbance

You can violate VDD voltage requirements.   VDD transients too high might cause a CPU 'jump' but most likely might damage the IC.   VDD transients too low might cause a CPU 'jump' (ie not enough digital voltage to read a proper instruction from FLASH).  However it is possible you might trip the POR mechanism and reset the CPU.

One quick way to create low VDD transients is to perform the "metal file and metal nail" test.  (Yes. I said a metal file and metal nail.  You know the hardware you can by at the hardware store)

The setup for this test is to wire the VDD power from the power source to the metal file using solder or a good alligator clip.  The VDD to the IC is soldered or clipped to a metal nail.  Note: GND is still connected to GND of the power supply.

To perform the test, press the nail to the file.  This will complete the VDD power circuit with relatively low resistance.  This should power up the CPU and start running.  Once running, drag the tip of the nail across the file surface.  This will cause very short durations of higher series resistance in the VDD circuit dropping the VDD voltage.   After many trials you will get mostly resets but some of them might be low-voltage induced improper FLASH or SRAM reads (or writes).  These might cause a CPU-lockup.  (Hence one of the main reasons for Watchdog protection.)

In past projects, I substituted SW and HW controlled VDD drops for testing.  I used a PSoC5 to source VDD to the DUT and cut the voltage.  While the voltage was dropping, I was monitoring it in HW with the PSoC5 and using a comparator that I set to a 'restart' threshold.  Once the threshold was triggered, I immediately reengaged the VDD to the DUT to prevent a POR.

This VDD transient control allowed me to find various issues with the SW such as where EEPROM was written incorrectly (EEPROM write voltage did not meet requirements).  If I remember correctly, I might have detected a few Watchdog events due to probable FLASH read voltage being too low for a correct read.

Sorry for being long-winded.  I hope this might be helpful for anyone reading this.

route VDD from the power supply through a metal file to VDD to the IC.   

Len
"Engineering is an Art. The Art of Compromise."
0 Likes