Deferring work on STM32

Thread Starter

Futurist

Joined Apr 8, 2025
748
I'm playing around with a basic program/library that manages the NRF24L01+ and so far it's gone well (I have a working TX and RX app that run well under debug).

Something has come up though that really raises a design question which is what general technique do people use for deferring work?

So in an interrupt handler there are "urgent" things to do (and various things one cannot do) but there are also application work to do that can/should be done outside the interrupt handler (to minimize the time that other interrupts are masked).

In an OS (like Windows) interrupt handlers can queue what's called a DPC and the system will execute the code pointed to by the DPC after the interrupt has exited.

But without an OS how do people usually stricture this? I can see a simple volatile flag could work but that's a bit limited especially if there might be different work to and different interrupts etc.
 

MrChips

Joined Oct 2, 2009
34,807
As a rule, I don't use RTOS. I prefer to have full control of my system.

You need to make a list of all processes and set execution times and priorities. Some events can be cued and postponed. Some events need to be handled within a certain time frame. Some events must be given top priority. All of this is likely obvious to you.

As an example, I have an ADC with sampling rate of 50 Msps running continously. The only way to handle this is with DMA. You want the hardware to handle as much of the work as possible, timer modules, communication interfaces etc. Use data buffers and DMA as much as possible.

All ISR should do no more that transferring data and setting a flag, taking no more than 1 μs with a fast MCU.

Execution in the main program loop has to be fast enough to handle all processes, otherwise you have a bottleneck.

In my case, storing 16K data at 50Msps takes just over 325 μs. Whatever needs to be done with the data must be completed in under 325 μs. I don't see any other way around this.
 

nsaspook

Joined Aug 27, 2009
16,321
I'm playing around with a basic program/library that manages the NRF24L01+ and so far it's gone well (I have a working TX and RX app that run well under debug).

Something has come up though that really raises a design question which is what general technique do people use for deferring work?

So in an interrupt handler there are "urgent" things to do (and various things one cannot do) but there are also application work to do that can/should be done outside the interrupt handler (to minimize the time that other interrupts are masked).

In an OS (like Windows) interrupt handlers can queue what's called a DPC and the system will execute the code pointed to by the DPC after the interrupt has exited.

But without an OS how do people usually stricture this? I can see a simple volatile flag could work but that's a bit limited especially if there might be different work to and different interrupts etc.
No need for a RTOS for the vast majority of programming tasks on a controller.

If it's I/O bound work, just transferring bytes to/from memory locations us DMA with a little interrupt driven state machine to handle the processing work. If you need to do none time critical processing use a low priority interrupt state machine as a task thread. The higher interrupt short processes will preempt the low task processing and the main thread will continue to process if you limit the low priority task processing to non-blocking sequences decided by it's internal state machine.
It's easily possible to do this without bottle-necking the main program loop if done correctly.

Just volatile will not work, it's just a flag to the compiler to not optimize out variables and code not directly affected by the main program flow, you also need a memory barrier method (Mutex or other thread control method) to signal between processes/tasks threads. Message queues, buffers or other methods work just fine. Find a good books about multi-processing theory, read it, understand it and use it.

https://en.wikipedia.org/wiki/Mutual_exclusion
1745006729941.png
 
Last edited:

Thread Starter

Futurist

Joined Apr 8, 2025
748
Thank you both, interesting to here professionals' views on this subject.

So with the NRF we get an interrupt when a packet is received and we clear the interrupt flag by updating the NRF status register, but then should I read the data packet next or exit the ISR and read the data "back in" the application code?

The data is transferred using SPI, as you both know.
 
Last edited:

nsaspook

Joined Aug 27, 2009
16,321
Thank you both, interesting to here professionals' views on this subject.

So with the NRF we get an interrupt when a packet is received and we clear the interrupt flag by updating the NRF status register, but then should I read the data packet next or exit the ISR and read the data "back in" the application code?
It depends on how much data needs to be read and how critical the time allotted will be to the main task. For a few bytes it will be faster just the read it in the ISR because of controller stack and state pushes and pops for each interrupt execution. For longer sequences of bytes DMA works well as it will provide a trasnfer complete interrupt that can signal the main task data it's time for processing data or is complete transferring data in/out of the DMA buffer array(s).
A example of mixed interrupt and DMA processing for SPI is shown here for a GLCD display update routine. The buffer is loaded in the main processing thread and the transfer is started in the main. From then on, the transfer and processing of the transfer is handled by DMA and DMA interrupts until complete.

While all that I/O is happening the main is still processing other I/O with another SPI port to the IMU, converting that data to a program usable format and getting ready to stuff another buffer for the GLCD.
https://forum.allaboutcircuits.com/threads/debugging-with-rigol-dho804.206296/post-1978322
1745008267838.png
 

nsaspook

Joined Aug 27, 2009
16,321
For another similar sensor device that uses a waveshare serial to Ethernet adapter module I also use DMA for 460,800bps TTL serial transfer to/from the module that runs in parallel with the GLCD display DMA.
C:
static volatile bool uart1_dma_busy = false; // acts as a mutex lock

#ifdef USE_SERIAL_DMA
void UART1DmaChannelHandler_State(DMAC_TRANSFER_EVENT, uintptr_t);
void UART1DmaWrite(const char *, uint32_t);

/*
* end of uart buffer complete flag handler callback
* interrupt handler for the completion of buffer transfer.
*/
void UART1DmaChannelHandler_State(DMAC_TRANSFER_EVENT event, uintptr_t contextHandle)
{
    uart1_dma_busy = false;
}

/*
* DMA uart serial function
* triggers the DMA transfer and returns, only one interrupt happens at the end of transfer
*/
void UART1DmaWrite(const char * buffer, uint32_t len)
{
    while (uart1_dma_busy || U1STAbits.UTXBF) { // should never wait in normal operation
    };

    uart1_dma_busy = true; // in process flag
    DMAC_ChannelTransfer(DMAC_CHANNEL_7, (const void *) buffer, (size_t) len, (const void*) &U1TXREG, (size_t) 1, (size_t) 1);
}
#endif
It's set an immutable (a atomic operation sized variable that interrupts can't change during the change of that variable) volatile flag uart1_dma_busy to signal main it's running.

1745009044775.png
It also uses CANFD (that has a dedicated DMA controller) to talk to other sensors. Not used in this example.
Remote MQTT data updates from the device.
C:
void UART1DmaWrite(const char *, uint32_t);

/*
* modified to use TTL serial to Ethernet module
* uses DMA on UART1 to speed 460,800bps serial transfers in background
*/
ssize_t mqtt_pal_sendall(mqtt_pal_socket_handle fd, const void* buf, size_t len, int flags)
{
    enum MQTTErrors error = 0;
    size_t sent = 0;
    while (sent < len) {
        UART1DmaWrite((const char *) buf, len);
        ssize_t rv = len;
        if (rv < 0) {
            if (errno == EAGAIN) {
                /* should call send later again */
                break;
            }
            error = MQTT_ERROR_SOCKET_ERROR;
            break;
        }
        if (rv == 0) {
            /* is this possible? maybe OS bug. */
            error = MQTT_ERROR_SOCKET_ERROR;
            break;
        }
        sent += (size_t) rv;
    }
    if (sent == 0) {
        return error;
    }
    return(ssize_t) sent;
}
C:
        /* we're sending the message */
        {
            ssize_t tmp = mqtt_pal_sendall(client->socketfd, msg->start + client->send_offset, msg->size - client->send_offset, 0);
            if (tmp < 0) {
                client->error = (enum MQTTErrors)tmp;
                MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
                return tmp;
            } else {
                client->send_offset += (unsigned long) tmp;
                if (client->send_offset < msg->size) {
                    /* partial sent. Await additional calls */
                    break;
                } else {
                    /* whole message has been sent */
                    client->send_offset = 0;
                }

            }

        }
Full MQTT send function.
C:
ssize_t __mqtt_send(struct mqtt_client *client)
{
    uint8_t inspected;
    ssize_t len;
    int inflight_qos2 = 0;
    int i = 0;

    MQTT_PAL_MUTEX_LOCK(&client->mutex);

    if (client->error < 0 && client->error != MQTT_ERROR_SEND_BUFFER_IS_FULL) {
        MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
        return client->error;
    }

    /* loop through all messages in the queue */
    len = mqtt_mq_length(&client->mq);
    for (; i < len; ++i) {
        struct mqtt_queued_message *msg = mqtt_mq_get(&client->mq, i);
        int resend = 0;
        if (msg->state == MQTT_QUEUED_UNSENT) {
            /* message has not been sent to lets send it */
            resend = 1;
        } else if (msg->state == MQTT_QUEUED_AWAITING_ACK) {
            /* check for timeout */
            if (MQTT_PAL_TIME() > msg->time_sent + client->response_timeout) {
                resend = 1;
                client->number_of_timeouts += 1;
                client->send_offset = 0;
            }
        }

        /* only send QoS 2 message if there are no inflight QoS 2 PUBLISH messages */
        if (msg->control_type == MQTT_CONTROL_PUBLISH
        && (msg->state == MQTT_QUEUED_UNSENT || msg->state == MQTT_QUEUED_AWAITING_ACK)) {
            inspected = 0x03 & ((msg->start[0]) >> 1); /* qos */
            if (inspected == 2) {
                if (inflight_qos2) {
                    resend = 0;
                }
                inflight_qos2 = 1;
            }
        }

        /* goto next message if we don't need to send */
        if (!resend) {
            continue;
        }

        /* we're sending the message */
        {
            ssize_t tmp = mqtt_pal_sendall(client->socketfd, msg->start + client->send_offset, msg->size - client->send_offset, 0);
            if (tmp < 0) {
                client->error = (enum MQTTErrors)tmp;
                MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
                return tmp;
            } else {
                client->send_offset += (unsigned long) tmp;
                if (client->send_offset < msg->size) {
                    /* partial sent. Await additional calls */
                    break;
                } else {
                    /* whole message has been sent */
                    client->send_offset = 0;
                }

            }

        }

        /* update timeout watcher */
        client->time_of_last_send = MQTT_PAL_TIME();
        msg->time_sent = client->time_of_last_send;

        /*
        Determine the state to put the message in.
        Control Types:
        MQTT_CONTROL_CONNECT     -> awaiting
        MQTT_CONTROL_CONNACK     -> n/a
        MQTT_CONTROL_PUBLISH     -> qos == 0 ? complete : awaiting
        MQTT_CONTROL_PUBACK      -> complete
        MQTT_CONTROL_PUBREC      -> awaiting
        MQTT_CONTROL_PUBREL      -> awaiting
        MQTT_CONTROL_PUBCOMP     -> complete
        MQTT_CONTROL_SUBSCRIBE   -> awaiting
        MQTT_CONTROL_SUBACK      -> n/a
        MQTT_CONTROL_UNSUBSCRIBE -> awaiting
        MQTT_CONTROL_UNSUBACK    -> n/a
        MQTT_CONTROL_PINGREQ     -> awaiting
        MQTT_CONTROL_PINGRESP    -> n/a
        MQTT_CONTROL_DISCONNECT  -> complete
         */
        switch (msg->control_type) {
        case MQTT_CONTROL_PUBACK:
        case MQTT_CONTROL_PUBCOMP:
        case MQTT_CONTROL_DISCONNECT:
            msg->state = MQTT_QUEUED_COMPLETE;
            break;
        case MQTT_CONTROL_PUBLISH:
            inspected = (MQTT_PUBLISH_QOS_MASK & (msg->start[0])) >> 1; /* qos */
            if (inspected == 0) {
                msg->state = MQTT_QUEUED_COMPLETE;
            } else if (inspected == 1) {
                msg->state = MQTT_QUEUED_AWAITING_ACK;
                /*set DUP flag for subsequent sends [Spec MQTT-3.3.1-1] */
                msg->start[0] |= MQTT_PUBLISH_DUP;
            } else {
                msg->state = MQTT_QUEUED_AWAITING_ACK;
            }
            break;
        case MQTT_CONTROL_CONNECT:
        case MQTT_CONTROL_PUBREC:
        case MQTT_CONTROL_PUBREL:
        case MQTT_CONTROL_SUBSCRIBE:
        case MQTT_CONTROL_UNSUBSCRIBE:
        case MQTT_CONTROL_PINGREQ:
            msg->state = MQTT_QUEUED_AWAITING_ACK;
            break;
        default:
            client->error = MQTT_ERROR_MALFORMED_REQUEST;
            MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
            return MQTT_ERROR_MALFORMED_REQUEST;
        }
    }

    /* check for keep-alive */
    {
        mqtt_pal_time_t keep_alive_timeout = client->time_of_last_send + (mqtt_pal_time_t) ((float) (client->keep_alive));
        if (MQTT_PAL_TIME() > keep_alive_timeout) {
            ssize_t rv = __mqtt_ping(client);
            if (rv != MQTT_OK) {
                client->error = (enum MQTTErrors)rv;
                MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
                return rv;
            }
        }
    }

    MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
    return MQTT_OK;
}
1745009091327.png

All of that code is designed to be state machine sequences so they don't block.

The RTOS is designed to simply that process by pretending there are separate tasks running concurrently but that eventually adds complexity, slows processing and add a extra layer of abstraction that's not needed if you understand the fundamentals of multi-process on a uni-processing machine with DMA and vectored interrupts like most 32-bit controlles have and on some 8-bit controllers like the PIC18F Q84 series I also use these methods on to handle overlapping I/O and processing.
 
Last edited:

Thread Starter

Futurist

Joined Apr 8, 2025
748
For another similar sensor device that uses a waveshare serial to Ethernet adapter module I also use DMA for 460,800bps TTL serial transfer to/from the module that runs in parallel with the GLCD display DMA.
C:
static volatile bool uart1_dma_busy = false; // acts as a mutex lock

#ifdef USE_SERIAL_DMA
void UART1DmaChannelHandler_State(DMAC_TRANSFER_EVENT, uintptr_t);
void UART1DmaWrite(const char *, uint32_t);

/*
* end of uart buffer complete flag handler callback
* interrupt handler for the completion of buffer transfer.
*/
void UART1DmaChannelHandler_State(DMAC_TRANSFER_EVENT event, uintptr_t contextHandle)
{
    uart1_dma_busy = false;
}

/*
* DMA uart serial function
* triggers the DMA transfer and returns, only one interrupt happens at the end of transfer
*/
void UART1DmaWrite(const char * buffer, uint32_t len)
{
    while (uart1_dma_busy || U1STAbits.UTXBF) { // should never wait in normal operation
    };

    uart1_dma_busy = true; // in process flag
    DMAC_ChannelTransfer(DMAC_CHANNEL_7, (const void *) buffer, (size_t) len, (const void*) &U1TXREG, (size_t) 1, (size_t) 1);
}
#endif
It's set an immutable (a atomic operation sized variable that interrupts can't change during the change of that variable) volatile flag uart1_dma_busy to signal main it's running.

View attachment 347429
It also uses CANFD (that has a dedicated DMA controller) to talk to other sensors. Not used in this example.
Remote MQTT data updates from the device.
C:
void UART1DmaWrite(const char *, uint32_t);

/*
* modified to use TTL serial to Ethernet module
* uses DMA on UART1 to speed 460,800bps serial transfers in background
*/
ssize_t mqtt_pal_sendall(mqtt_pal_socket_handle fd, const void* buf, size_t len, int flags)
{
    enum MQTTErrors error = 0;
    size_t sent = 0;
    while (sent < len) {
        UART1DmaWrite((const char *) buf, len);
        ssize_t rv = len;
        if (rv < 0) {
            if (errno == EAGAIN) {
                /* should call send later again */
                break;
            }
            error = MQTT_ERROR_SOCKET_ERROR;
            break;
        }
        if (rv == 0) {
            /* is this possible? maybe OS bug. */
            error = MQTT_ERROR_SOCKET_ERROR;
            break;
        }
        sent += (size_t) rv;
    }
    if (sent == 0) {
        return error;
    }
    return(ssize_t) sent;
}
C:
        /* we're sending the message */
        {
            ssize_t tmp = mqtt_pal_sendall(client->socketfd, msg->start + client->send_offset, msg->size - client->send_offset, 0);
            if (tmp < 0) {
                client->error = (enum MQTTErrors)tmp;
                MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
                return tmp;
            } else {
                client->send_offset += (unsigned long) tmp;
                if (client->send_offset < msg->size) {
                    /* partial sent. Await additional calls */
                    break;
                } else {
                    /* whole message has been sent */
                    client->send_offset = 0;
                }

            }

        }
Full MQTT send function.
C:
ssize_t __mqtt_send(struct mqtt_client *client)
{
    uint8_t inspected;
    ssize_t len;
    int inflight_qos2 = 0;
    int i = 0;

    MQTT_PAL_MUTEX_LOCK(&client->mutex);

    if (client->error < 0 && client->error != MQTT_ERROR_SEND_BUFFER_IS_FULL) {
        MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
        return client->error;
    }

    /* loop through all messages in the queue */
    len = mqtt_mq_length(&client->mq);
    for (; i < len; ++i) {
        struct mqtt_queued_message *msg = mqtt_mq_get(&client->mq, i);
        int resend = 0;
        if (msg->state == MQTT_QUEUED_UNSENT) {
            /* message has not been sent to lets send it */
            resend = 1;
        } else if (msg->state == MQTT_QUEUED_AWAITING_ACK) {
            /* check for timeout */
            if (MQTT_PAL_TIME() > msg->time_sent + client->response_timeout) {
                resend = 1;
                client->number_of_timeouts += 1;
                client->send_offset = 0;
            }
        }

        /* only send QoS 2 message if there are no inflight QoS 2 PUBLISH messages */
        if (msg->control_type == MQTT_CONTROL_PUBLISH
        && (msg->state == MQTT_QUEUED_UNSENT || msg->state == MQTT_QUEUED_AWAITING_ACK)) {
            inspected = 0x03 & ((msg->start[0]) >> 1); /* qos */
            if (inspected == 2) {
                if (inflight_qos2) {
                    resend = 0;
                }
                inflight_qos2 = 1;
            }
        }

        /* goto next message if we don't need to send */
        if (!resend) {
            continue;
        }

        /* we're sending the message */
        {
            ssize_t tmp = mqtt_pal_sendall(client->socketfd, msg->start + client->send_offset, msg->size - client->send_offset, 0);
            if (tmp < 0) {
                client->error = (enum MQTTErrors)tmp;
                MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
                return tmp;
            } else {
                client->send_offset += (unsigned long) tmp;
                if (client->send_offset < msg->size) {
                    /* partial sent. Await additional calls */
                    break;
                } else {
                    /* whole message has been sent */
                    client->send_offset = 0;
                }

            }

        }

        /* update timeout watcher */
        client->time_of_last_send = MQTT_PAL_TIME();
        msg->time_sent = client->time_of_last_send;

        /*
        Determine the state to put the message in.
        Control Types:
        MQTT_CONTROL_CONNECT     -> awaiting
        MQTT_CONTROL_CONNACK     -> n/a
        MQTT_CONTROL_PUBLISH     -> qos == 0 ? complete : awaiting
        MQTT_CONTROL_PUBACK      -> complete
        MQTT_CONTROL_PUBREC      -> awaiting
        MQTT_CONTROL_PUBREL      -> awaiting
        MQTT_CONTROL_PUBCOMP     -> complete
        MQTT_CONTROL_SUBSCRIBE   -> awaiting
        MQTT_CONTROL_SUBACK      -> n/a
        MQTT_CONTROL_UNSUBSCRIBE -> awaiting
        MQTT_CONTROL_UNSUBACK    -> n/a
        MQTT_CONTROL_PINGREQ     -> awaiting
        MQTT_CONTROL_PINGRESP    -> n/a
        MQTT_CONTROL_DISCONNECT  -> complete
         */
        switch (msg->control_type) {
        case MQTT_CONTROL_PUBACK:
        case MQTT_CONTROL_PUBCOMP:
        case MQTT_CONTROL_DISCONNECT:
            msg->state = MQTT_QUEUED_COMPLETE;
            break;
        case MQTT_CONTROL_PUBLISH:
            inspected = (MQTT_PUBLISH_QOS_MASK & (msg->start[0])) >> 1; /* qos */
            if (inspected == 0) {
                msg->state = MQTT_QUEUED_COMPLETE;
            } else if (inspected == 1) {
                msg->state = MQTT_QUEUED_AWAITING_ACK;
                /*set DUP flag for subsequent sends [Spec MQTT-3.3.1-1] */
                msg->start[0] |= MQTT_PUBLISH_DUP;
            } else {
                msg->state = MQTT_QUEUED_AWAITING_ACK;
            }
            break;
        case MQTT_CONTROL_CONNECT:
        case MQTT_CONTROL_PUBREC:
        case MQTT_CONTROL_PUBREL:
        case MQTT_CONTROL_SUBSCRIBE:
        case MQTT_CONTROL_UNSUBSCRIBE:
        case MQTT_CONTROL_PINGREQ:
            msg->state = MQTT_QUEUED_AWAITING_ACK;
            break;
        default:
            client->error = MQTT_ERROR_MALFORMED_REQUEST;
            MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
            return MQTT_ERROR_MALFORMED_REQUEST;
        }
    }

    /* check for keep-alive */
    {
        mqtt_pal_time_t keep_alive_timeout = client->time_of_last_send + (mqtt_pal_time_t) ((float) (client->keep_alive));
        if (MQTT_PAL_TIME() > keep_alive_timeout) {
            ssize_t rv = __mqtt_ping(client);
            if (rv != MQTT_OK) {
                client->error = (enum MQTTErrors)rv;
                MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
                return rv;
            }
        }
    }

    MQTT_PAL_MUTEX_UNLOCK(&client->mutex);
    return MQTT_OK;
}
View attachment 347430

All of that code is designed to be state machine sequences so they don't block.

The RTOS is designed to simply that process by pretending there are separate tasks running concurrently but that eventually adds complexity, slows processing and add a extra layer of abstraction that's not needed if you understand the fundamentals of multi-process on a uni-processing machine with DMA and vectored interrupts like most 32-bit controlles have and on some 8-bit controllers like the PIC18F Q84 series I also use these methods on to handle overlapping I/O and processing.
This is very interesting, I will sift through your code more closely a bit later (but won't understand very much!) but I see that one would be a fool to not leverage DMA if it's available. I also like the idea of a state machine where state transitions drive what happens "next" and avoiding those work handlers from ever blocking, a recipe for sanity.


I guess at the simplest level one just has an infinite (idle) loop and when any IO event occurs an interrupt takes place and depending on state that is handled in some way, does its stuff and returns until some later interrupt arrives.

I need to read up on DMA with SPI for my little NRF project, currently it reads a "packet" (Nordic's term) in this way:

C:
void read_bytes(NrfSpiDevice_ptr ptr, uint8_t bytes_in_ptr[], uint8_t count)
{
    ptr->status = HAL_SPI_Receive(&ptr->spi, bytes_in_ptr, count, HAL_MAX_DELAY);
 
    if (ptr->status != HAL_OK)
        pulse_led_forever(100); // crude error trap for now
}
This works as a proof of concept (my original goal was just to prove data xfer and that proved I had a handle on the configuration of each NRF).

Can you share: MQTT_PAL_MUTEX_UNLOCK ?

Oh this is interrupt handler for the NRF:


C:
void EXTI0_IRQHandler(void)
{
    uint8_t data[32];
    NrfReg_STATUS status;
   
    HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_0);
   
    nrf24_package.Read.STATUS(&device, &status);
   
    if (status.RX_DR) // these three are "1 to clear"
    {
        status.TX_DS = 0;
        status.MAX_RT = 0;
        nrf24_package.Write.STATUS(&device, status, &status);
    }
   
    nrf24_package.Command.R_RX_PAYLOAD(&device, data, 32, &status);
   
    msg_received = 1;
}
 
Last edited:

nsaspook

Joined Aug 27, 2009
16,321
It's not really used as there (MQTT protocol processing) are no real processing threads as in a RTOS. It's all loop state machine code. The only 'thread' MUTEX code (using atomic boolean variables) is in the DMA/interrupt/main interfacing code for direct I/O operations.
/*
* No threads so no use for a MUTEX
* set to zero
*/
#define MQTT_PAL_MUTEX_INIT(mtx_ptr) 0
#define MQTT_PAL_MUTEX_LOCK(mtx_ptr) 0
#define MQTT_PAL_MUTEX_UNLOCK(mtx_ptr) 0
C:
#define __unix__ // use this as the basis for a PIC32MK config


    /* UNIX-like platform support for PIC32MK XC32 */
#if defined(__unix__) || defined(__APPLE__) || defined(__NuttX__)
#include <limits.h>
#include <string.h>
#include <stdarg.h>
#include <time.h>
#include <stdint.h>
#include "endian.h"
#include "definitions.h"                // SYS function prototypes

#define ssize_t        int32_t
#define SERIAL_BUF_SIZ    64

#define MQTT_PAL_HTONS(s) htons(s)
#define MQTT_PAL_NTOHS(s) ntohs(s)

#define MQTT_PAL_TIME() time(NULL)

    typedef time_t mqtt_pal_time_t;

    /*
     * No threads so no use for a MUTEX
     * set to zero
     */
#define MQTT_PAL_MUTEX_INIT(mtx_ptr) 0
#define MQTT_PAL_MUTEX_LOCK(mtx_ptr) 0
#define MQTT_PAL_MUTEX_UNLOCK(mtx_ptr) 0

#if !defined(MQTT_USE_CUSTOM_SOCKET_HANDLE)
#if defined(MQTT_USE_MBEDTLS)
    struct mbedtls_ssl_context;
    typedef struct mbedtls_ssl_context *mqtt_pal_socket_handle;
#elif defined(MQTT_USE_WOLFSSL)
#include <wolfssl/ssl.h>
    typedef WOLFSSL* mqtt_pal_socket_handle;
#elif defined(MQTT_USE_BIO)
#include <openssl/bio.h>
    typedef BIO* mqtt_pal_socket_handle;
#elif defined(MQTT_USE_BEARSSL)
#include <bearssl.h>

    typedef struct _bearssl_context {
        br_ssl_client_context sc;
        br_x509_minimal_context xc;
        br_sslio_context ioc;
        size_t ta_count;
        br_x509_trust_anchor *anchOut;
        int fd;
        int (*low_read)(void *read_context, unsigned char *buf, size_t len);
        int (*low_write)(void *write_context, const unsigned char *buf, size_t len);
    } bearssl_context;

    typedef bearssl_context* mqtt_pal_socket_handle;
#else
    typedef int mqtt_pal_socket_handle;
#endif
#endif
#elif defined(_MSC_VER) || defined(WIN32)
#include <limits.h>
#include <winsock2.h>
#include <windows.h>
#include <time.h>
#include <stdint.h>

    typedef SSIZE_T ssize_t;
#define MQTT_PAL_HTONS(s) htons(s)
#define MQTT_PAL_NTOHS(s) ntohs(s)

#define MQTT_PAL_TIME() time(NULL)

    typedef time_t mqtt_pal_time_t;
    typedef CRITICAL_SECTION mqtt_pal_mutex_t;

#define MQTT_PAL_MUTEX_INIT(mtx_ptr) InitializeCriticalSection(mtx_ptr)
#define MQTT_PAL_MUTEX_LOCK(mtx_ptr) EnterCriticalSection(mtx_ptr)
#define MQTT_PAL_MUTEX_UNLOCK(mtx_ptr) LeaveCriticalSection(mtx_ptr)


#if !defined(MQTT_USE_CUSTOM_SOCKET_HANDLE)
#if defined(MQTT_USE_BIO)
#include <openssl/bio.h>
    typedef BIO* mqtt_pal_socket_handle;
#else
    typedef SOCKET mqtt_pal_socket_handle;
#endif
#endif

#endif
Get your code running first in the NRF way and then see what can be done to optimize sections of the code. Most of the money code is hidden in the HAL it seems. I'm not a STM ecosystem guy so I can't help you there but C code is C code, even if some is C++ code in the low-level embedded world.

A quick look at the HAL source shows the SPI DMA source code. Looks OK from a quick look if you need to send/receive something like 16 byte or larger (because of the complexity of setup) hunks of data at a time, all the time. Don't know if the setup complexity needed to conserve speed and processor resources is needed in your application.
https://sourcevu.sysprogs.com/stm32/HAL/files/Src/stm32f4xx_hal_spi.c#tok9591
https://www.st.com/resource/en/appl...-series-dma-controller-stmicroelectronics.pdf

DMA controller description The DMA is an AMBA advanced high-performance bus (AHB) module that features three AHB ports: a slave port for DMA programming and two master ports (peripheral and memory ports) that allow the DMA to initiate data transfers between different slave modules. The DMA allows data transfers to take place in the background, without the intervention of the Cortex-Mx processor. During this operation, the main processor can execute other tasks and it is only interrupted when a whole data block is available for processing. Large amounts of data can be transferred with no major impact on the system performance. The DMA is mainly used to implement central data buffer storage (usually in the system SRAM) for different peripheral modules. This solution is less expensive in terms of silicon and power consumption compared to a distributed solution where each peripheral needs to implement it own local data storage. The STM32F2/F4/F7 DMA controller takes full advantage of the multi-layer bus system in order to ensure very low latency both for DMA transfers and for CPU execution/interrupt event detection/service.
 
Last edited:

nsaspook

Joined Aug 27, 2009
16,321
Scope shots of the IO/main processing concurrency on this sensor device using DMA.

C:
        TP3_Toggle();
#ifdef SHOW_LCD
        if (TimerDone(TMR_HOST)) {
            io_now = true;
#ifndef HOST_MQTT  
            StartTimer(TMR_HOST, HOST_CANFD_UPDATE);
#else
            StartTimer(TMR_HOST, HOST_MQTT_UPDATE);
#endif
            OledUpdate();
        }
The white grabber clip is connected to TP3. When you design a board, always include several GPIO connected test points for debugging, testing and optimizing the software.
1745083221523.png

1745082339168.png
1745082282568.png
Yellow: TTL tx to the Ethernet module for MQTT packets to the remote server from the sensor board
Green: TTL rx from the Ethernet module for remote server ACKs to MQTT packets from the sensor board.
Cyan: Main Processing toggles of the main loop processing, The longer gap is the processing time for IMU sensor data, display graphics and text pushes to a memory buffer for later output to the GLCD device, MQTT protocol and housekeeping tasks. It's not a busy wait.
Blue: GLCD spi clocks to the display for screen updates using the data from the previous main loop processing display memory buffer operations.
1745082781367.png
1745082801185.png

Several data collection and display cycles. There is plenty of time and processing resources left for addition functionality if needed.
1745082962677.png

1745084681652.png
Ehternet module TTL data stats.
1745084805738.png
Network connection details. Running the serial port at half speed and using 2 stop bits to stop the module from overruns during the DMA stream with no gaps between UART outputs per byte sent.
 
Last edited:

Thread Starter

Futurist

Joined Apr 8, 2025
748
As a rule, I don't use RTOS. I prefer to have full control of my system.

You need to make a list of all processes and set execution times and priorities. Some events can be cued and postponed. Some events need to be handled within a certain time frame. Some events must be given top priority. All of this is likely obvious to you.

As an example, I have an ADC with sampling rate of 50 Msps running continously. The only way to handle this is with DMA. You want the hardware to handle as much of the work as possible, timer modules, communication interfaces etc. Use data buffers and DMA as much as possible.

All ISR should do no more that transferring data and setting a flag, taking no more than 1 μs with a fast MCU.

Execution in the main program loop has to be fast enough to handle all processes, otherwise you have a bottleneck.

In my case, storing 16K data at 50Msps takes just over 325 μs. Whatever needs to be done with the data must be completed in under 325 μs. I don't see any other way around this.
Now what about a common issue even I've encountered recently.

This is when an interrupt handler accesses the SPI channel while the interrupted code is - at that very instant - accessing that same SPI.

The HAL policy is to return error code HAL_BUSY but ideally we want to forcibly single "thread" such access - how do you guys manage this?
 

nsaspook

Joined Aug 27, 2009
16,321
Now what about a common issue even I've encountered recently.

This is when an interrupt handler accesses the SPI channel while the interrupted code is - at that very instant - accessing that same SPI.

The HAL policy is to return error code HAL_BUSY but ideally we want to forcibly single "thread" such access - how do you guys manage this?
That's what memory/resource barriers on resources are for. I've not looked deeply (or much at all because STM is not my bag) in HAL code for their interrupt tasking locking policy but things like HAL_BUSY is perfectly valid on a bare-metal state-machine tasking model. You just stay in the checking state (non-blocking) until the all clear or a timeout happens.
 
Top