Annoying interrupt

Thread Starter

Futurist

Joined Apr 8, 2025
721
In playing around with NRF24 devices connected to Nucleos, I came across a bug of some kind yesterday.

The symptoms seem to be:

  1. The MCU gets and handles an interrupt corresponding to a received payload.
  2. The MCU clears the interrupt flags on the NRF's STATUS register.
  3. The Handler exits and the "main" code resumes and begins to read the payload.
  4. During that operation another interrupt arrives.
  5. The MCU app fails (because HAL was entered while reading the payload and HAL was entered concurrently as the handler clears the flags.

This only happens sometimes, it will run fine for tens of minutes message flow is fine.

I had the NRF's number of auto-ack retries set to 5 but reduced that to 1 and the issue seems to have gone (or perhaps is now rarer).

After pondering I wondered if this is a scenario:

The TX sends message, the RX gets it and auto sends an Ack.

The TX does not get that Ack for some reason and so resends the message about 250 uS later.

The message arrives and the RX interrupt handler runs, while the main code is in the process of reading data pertaining to the received message.

The odd thing now is, that if the TX really was missing sent Acks, it would report that (because it's coded to do so) but it isn't...
 

crugorocks

Joined May 1, 2025
31
The NRF24L01+ uses SPI for all communication. Both your interrupt handler and main code probably use the same SPI handle through STM32's HAL layer. HAL is not reentrant unless explicitly protected. So If HAL_SPI_Transmit() or HAL_SPI_TransmitReceive() is in use when the interrupt handler triggers and also tries to use SPI a collision may take place.
 

Thread Starter

Futurist

Joined Apr 8, 2025
721
The NRF24L01+ uses SPI for all communication. Both your interrupt handler and main code probably use the same SPI handle through STM32's HAL layer. HAL is not reentrant unless explicitly protected. So If HAL_SPI_Transmit() or HAL_SPI_TransmitReceive() is in use when the interrupt handler triggers and also tries to use SPI a collision may take place.
Yes, that's exactly what I'm seeing. However I don't understand how the interrupt can be triggered just a short time after it was already triggered.

My test TX has set the auto-ack retry interval to be 750 uS and the delay between successive transmitted messages is 50 mS.

So if the TX did not receive an Ack, it would be 750 uS before the next retry, yet it seems the RX gets an int, begins to read the payload and then in the middle of that gets another int.

The "app" code should really never be inside HAL at the point when an interrupt gets triggered.

The TX used to stop (blink red LED forever) whenever it find HAL is busy, as I regard that as a design fault. For now I've made the code that does HAL SPI, loop if HAL is busy and have a breakpoint set for when this happens - hoping to learn more...
 

Thread Starter

Futurist

Joined Apr 8, 2025
721
Perhaps I need to actually measure how long the reading of payload takes, I'm assuming its a few microseconds but that might be BS.
 

Thread Starter

Futurist

Joined Apr 8, 2025
721
I dug this code out that in principle gives me the approx microseconds, if I subtract the two values it suggest that the two reads (payload size and payload - 11 bytes) take about 400 or so uS.

C:
uint32_t aft = HAL_GetTick() * 1000 + (SysTick->LOAD - SysTick->VAL) / (SystemCoreClock / 1000000);
 
Last edited:

Thread Starter

Futurist

Joined Apr 8, 2025
721
I just tweaked the interrupt handler, I have an array of 32 slots where I record the timestamp (see post above this one) every time the handler is invoked, it rotates so when the last slot has been used I reset the index to 0.

This lets me see the previous 31 interrupt timestamps so I can investigate the issue.

Working normally (as it does for first few mins) I can see the stamps are about 61 mS apart and that's consistent with the TX app that sleeps for 50 mS between message sends.

I'll just wait now for the issue to recur and see what the timestamp delta is when this fails, the logic seems to imply that I'm seeing two successive interrupts very close together, but am I...
 

Thread Starter

Futurist

Joined Apr 8, 2025
721
Results:

1747682951850.png

See the highlighted lines, the event for slot 11 was about 60 mS after the event for slot 10 - expected.

But the event for slot 12 occurred just 509 uS after the event fir slot 11 - weird.

The TX's NRF is setup to send up to 5 retries, each one being sent 1 mS after the previous attempt...

What is going on...
 

Thread Starter

Futurist

Joined Apr 8, 2025
721
I did more tests and when I dug down I found that the interrupt was genuine and the payload was present.

I modified the TX too so that it sends an 8 byte payload which is just an incrementing 64 bit unsigned integer.

One message was stamped say 123456 and the other (the one that seems to have arrived 509 uS after the previous, was stamped 123457.

So it really does look like two genuine messages are arriving but far too close together.

The TX itself pauses for 50 mS after sending a message...
 

Thread Starter

Futurist

Joined Apr 8, 2025
721
This is (apparently) fixed, the TX was at fault, it was not flushing the NRF's TX FIFO when a message send failed (when it never got an auto ack) and so the presence of a failed send impacted FIFO behavior.
 
Top