STM32 and memory integrity

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
I'm assuming there's nothing to prevent errant code from overwriting other code or data memory, even that assigned to stuff like HAL and startup code?

Is this true? because I've had some very odd problems today, they've gone now but I'm not sure if they were due to some corruption caused during execution of test code.

A recurring problem is resetting the Nucleo when I suspect corruption is present. This simply causes the already installed code to start again and so corrupt again. So by the time I've started the app under debug to see "what's happening" the problem, corruption, has already done the damage.

The only way I've found to avoid this is to put say a 30 seconds delay at the start of the app, then I can repower the board and promptly attach the debugger to it, that way the code restarts but never gets running, so when the 30 seconds times out I am in control of the system before any corruption can take place, of course I don't need to wait for the 30 seconds, I can just do a "Set Next Statement" in Visual Studio, but you get the idea.

It's rather hard to say whether the odd issues I saw were due to:

  1. A bug in the code I was testing.
  2. Corruption to the code or memory (yes, I know, that's a bug too!)
  3. The nRF24 device itself, getting into a bad state.

There's no way to reset the nRF24 other than cycling it's power and adding that ability to the control software seems like a lot to ask, why they didn't expose a hard reset pin is a mystery to me.

How is this handled in professional projects?
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
OK I've had more mysterious failures. I can see though that there is a HAL_BUSY status being reported (but, inexcusably, ignored). Even resetting the board did not clear this, but again, as I said earlier, repowering the board simply reloads the potentially buggy code which could be corrupting memory used by HAL...

So, this leads to my next question, does HAL include any kind of memory validator? utility functions that scan memory and structures to check their integrity? These kinds of tools often exist in OS development where literally anything can go wrong in an immature codebase. C makes it rather easy to corrupt memory too, even carefully written code can contain very subtle pointer/address logic errors.
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
Well the problem is bothering me, it seems - somehow - that (sometimes, not all the time) after running a bunch of SPI operations, successfully for many thousands times or so, the HAL begins to object, returns HAL_LOCKED and at that point the app freezes so I can debug it. If I manually step through the HAL code and step past the check for HAL_LOCKED and then just continue, it resumes running fine!

I doubt I'm doing anything that pushes HAL, this will almost certainly be my code (or the nRF24?) but its very puzzling.

Perhaps electrical noise? might that possibly, occasionally, impact an SPI IO?
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
I'm increasingly suspicious of this:

Code:
    GPIO_InitStruct_spi.Pin       = GPIO_PIN_5 | GPIO_PIN_6 | GPIO_PIN_7;
    GPIO_InitStruct_spi.Mode      = GPIO_MODE_AF_PP;
    GPIO_InitStruct_spi.Pull      = GPIO_PULLDOWN;
    GPIO_InitStruct_spi.Speed     = GPIO_SPEED_MEDIUM;
    GPIO_InitStruct_spi.Alternate = GPIO_AF5_SPI1;
This is part of the SPI setup - BUT - that pin - GPIO_PIN_5 on GPIOA - is also hard wired to the boards (only!) general purpose LED, so I wonder if that being present on the same wire, might be a potential source of problems, I need to use a different GPIO line...
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
OK I just discovered something that might well explain this! I was just beginning to prepare a second Nucleo and nRF24 to run two identical setups in parallel to see if they both misbehave when I saw something.

My nRF24 device is plugged into one of these - a YL-105

1668013009637.png

I had these, they came with the nRF24 devices and I unthinkingly just used it because it was easier to connect to the pins and stuff.

Well I also read early on that the nRF24s need 3.3v and to be careful about that, so I ended up connecting the YL-105 power pins to the Nucleo's 3.3v VCC.

Turns out of course that when I measured this just now, the nRF device is seeing just 2.4v and that might well lead to instability and possibly mess up the Nucleo's SPI peripheral...Of course, also, the SPI clock and MOSI pins would have been around 3.3v, exceeding the VCC on the nRF itself, I'm no expert but this is the kind of thing that can lead to all sorts of behaviors...

I'm going to correct this, see how it pans out...
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
nRF24L01+ supply voltage range is 1.9V-3.6V.
OK yes, that's true - but if powered at 2.4 v, it might cause oddities if the SPI clock, SPI MOSI and SPI CS are at 3.3v, I'm assuming this is bad anyway, so far - limited stress testing - it seems better...
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
Sorry, I didn't finish editing.
VDD should be between 2.7V and 3.3V.
Yes and it seems the max digital input voltage should be at VDD so I was definitely operating out of spec.

The best test is for me to rig up another setup, identical Nucleo and so on, and run one at correct voltage and the other at the low volatge and see if that one fails.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,083
I'm now of the opinion that what's been happening with these occassional failures is in fact not the voltage supplied to the NRF device nor a mystery bug in the code, it seems this might all have been caused by some odd connection weirdness with the USB cables that attach the Nucleo to my desktop.

I noticed during the past couple of days that if move the cable/connector slightly, then Windows beeps as if I just inserted a USB device, it sometimes does it several times in quick succession too.

This is absolutely a problem as there is no way the USB connection should be interrupted in any way unless I do so intentionally.

I suspect then that just move a mug of coffee or something might have move the wire and caused some nasty glitches, very brief interruptions to the power or the JTAG signals between the board and Visual Studio.

I just recabled this and can move the wire freely without thus beeping, so it is now stable.

I'll run some lengthy soak tests now...
 

nsaspook

Joined Aug 27, 2009
10,678
I'm now of the opinion that what's been happening with these occassional failures is in fact not the voltage supplied to the NRF device nor a mystery bug in the code, it seems this might all have been caused by some odd connection weirdness with the USB cables that attach the Nucleo to my desktop.

I noticed during the past couple of days that if move the cable/connector slightly, then Windows beeps as if I just inserted a USB device, it sometimes does it several times in quick succession too.

This is absolutely a problem as there is no way the USB connection should be interrupted in any way unless I do so intentionally.

I suspect then that just move a mug of coffee or something might have move the wire and caused some nasty glitches, very brief interruptions to the power or the JTAG signals between the board and Visual Studio.

I just recabled this and can move the wire freely without thus beeping, so it is now stable.

I'll run some lengthy soak tests now...
Another demonstration why USB sucks for critical connections.
 
Top