XC8: .asm to C ( i.e. Lowering One's Expectations)

WBahn

Joined Mar 31, 2012
30,062
I was reading yesterday and had no idea until then, that many processors these days now include a "popcount" instruction, doing that in the hardware with an instruction makes a great deal of sense.
Which underscores the value of being able to use assembly, either in total or as embedded within HL source code, since few compilers stand a chance of recognizing the code pattern that would implement this functionality, which is widely used in cryptography and some other fields.
 

ApacheKid

Joined Jan 12, 2015
1,610
Which underscores the value of being able to use assembly, either in total or as embedded within HL source code, since few compilers stand a chance of recognizing the code pattern that would implement this functionality, which is widely used in cryptography and some other fields.
Well take a look at this:

1668805962103.png

 

WBahn

Joined Mar 31, 2012
30,062
Well take a look at this:

That's IF it is written in a way that the compiler can recognize the pattern. Minor, seemingly irrelevant changes can prevent that from happening.

When I was doing SDR work we had code that slowed down considerably after making some (presumably) inconsequential changes. After a bunch of headscratching and analyzing we discovered that the XCode compiler has previously recognized a macro as implementing a rotate operation and generated the code as such, but after the change (which did NOT involve any changes to the macro), it no longer did. It became a real thorn in our side because we never did figure out the secret sauce that was needed in order to ensure the compiler recognized it. Even if we had, it wouldn't mean that people that took our code and compiled it would see the same results.
 

ApacheKid

Joined Jan 12, 2015
1,610
That's IF it is written in a way that the compiler can recognize the pattern. Minor, seemingly irrelevant changes can prevent that from happening.

When I was doing SDR work we had code that slowed down considerably after making some (presumably) inconsequential changes. After a bunch of headscratching and analyzing we discovered that the XCode compiler has previously recognized a macro as implementing a rotate operation and generated the code as such, but after the change (which did NOT involve any changes to the macro), it no longer did. It became a real thorn in our side because we never did figure out the secret sauce that was needed in order to ensure the compiler recognized it. Even if we had, it wouldn't mean that people that took our code and compiled it would see the same results.
I don't disagree, I did think it was interesting though that in that example the minimal code was generated, but as you say that was kind of specific, a heuristic I suppose.

A far better solution here would be to introduce a built-in language function or something that represents "population" and then let the code generator generate the ideal instructions for that depending on target platform.

This is why I mention language extensibility so often, without extensibility it is very hard indeed to improve a language. Sure C is extensible in the sense we can just add library functions and so on, but that's not very good for several reasons.

I recall an opposite problem many years ago, like 1988, on Stratus minis. We had an .obj file for some financial calculation but could not find the source, the code was not suspect but did need to be changed as part of some important legislation driven change and we had a hell of a time looking at the (68020) assembly code and trying to infer what the original source must have looked like, took quite a few days. We could at least write our trial code and compile that and see how it differed, so it was a process of gradual refinement and - if I recall - the code was somewhat involved but was not more than like 10 statements in the end.
 

Thread Starter

joeyd999

Joined Jun 6, 2011
5,283
Here's some more code for constructive criticism. It is to duplicate my .asm hysteretic rotary encoder code for the Bourns PEC12R that I posted beginning at this post (you need to scroll through future posts to find the final code).

Again the code fits into my master framework, is completely non-blocking, self-initializes and requires only to be polled once each main program loop. It uses interrupt-on-change interrupts for the A and B quadrature signals.

For speed in .asm, I would use a 16 state state machine to decode the up/down increments from the encoder *very* quickly. State decoding in C is *very* slow by comparison, so it was replaced with the conditionals:

C:
if ((enc_capture == 0b0001)||(enc_capture == 0b0111)||(enc_capture == 0b1000)||(enc_capture == 0b1110))
and

C:
if ((enc_capture == 0b0010)||(enc_capture == 0b0100)||(enc_capture == 0b1011)||(enc_capture == 0b1101))
I don't like this solution because the execution time changes depending upon the state of the encoder.

This code works perfectly without any signal conditioning on the encoder pins, even though the outputs are noisy mechanical switches.

Here's the code:

rotary_encoder.h:
/*
 * File:   rotary_encoder.h
 * Author: JoeyD999
 *
 * Created on November 11, 2022, 10:58 AM
 */

#ifndef ROTARY_ENCODER_H
#define    ROTARY_ENCODER_H

#ifdef    __cplusplus
extern "C" {
#endif
    
//                                   Port B
//         +-------+-------+-------+-------+-------+-------+-------+-------+
//Bit      |   7   |   6   |   5   |   4   |   3   |   2   |   1   |   0   |
//         +-------+-------+-------+-------+-------+-------+-------+-------+
//Device   | ISCP  |T SENSE| ANASW |  LCD  |            ENCODER            |
//         +-------+-------+-------+-------+-------+-------+-------+-------+
//Function |  PGD  |PGC/TCS| ASW2  | LCDBL | ENCSW | ENCA  | ENCON | ENCB  |
//         +-------+-------+-------+-------+-------+-------+-------+-------+

#include "custom_types.h"

#define ENC_IOCBP_INIT 0b00000101
#define ENC_IOCBN_INIT 0b00000101   

#define ENCON   LATB1
#define ENCPORT PORTB
  
//volatile flags   
#define enc_ready_vf    enc_vflags1.bitv.b0
#define enc_up_vf       enc_vflags1.bitv.b1
#define enc_dn_vf       enc_vflags1.bitv.b2
    
//hysteretic flags   
#define _enc_up_f       enc_vflags2.bitv.b0
#define _enc_dn_f       enc_vflags2.bitv.b1

//userspace flags
#define enc_ready_f     enc_flags.bitv.b0
#define enc_up_f        enc_flags.bitv.b1
#define enc_dn_f        enc_flags.bitv.b2

extern uint_fast8_t enc_state;   
extern uint_fast8_t enc_timer;

volatile ct_bitmap8 enc_vflags1;
volatile ct_bitmap8 enc_vflags2;
ct_bitmap8 enc_flags;

uint_fast8_t enc_capture;

volatile int_fast8_t enc_minor_count;
volatile int_fast8_t enc_major_count;
int_fast16_t encoder_count;
    
void enc_poll(void);

#ifdef    __cplusplus
}
#endif

#endif    /* ROTARY_ENCODER_H */
rotary_encoder.c:
/*
 * File:   rotary_encoder.c
 * Author: JoeyD999
 *
 * Created on November 11, 2022, 10:58 AM
 */

#include <xc.h>
#include "interrupts.h"
#include "rotary_encoder.h"
#include "power.h"
#include "system_timer_definitions.h"
#include "encoder_callbacks.h"

uint_fast8_t enc_state=0;   
uint_fast8_t enc_timer=0;   

void __interrupt(irq(IRQ_IOC), base(IVT_Base)) IOC_ISR(void)
{
    IOCBF = 0;                          //clear interrupt flags
      
    enc_capture = ((enc_capture << 1) & 0b1010) | (ENCPORT & 0b0101);

    if ((enc_capture == 0b0001)||(enc_capture == 0b0111)||(enc_capture == 0b1000)||(enc_capture == 0b1110))
    {
        //Decrement count
        enc_minor_count--;
        
        //Process hysteresis
        if ((enc_minor_count == 2) && (!_enc_dn_f))
        {
            enc_major_count--;
            enc_vflags1.bytev = 0b101;
            enc_vflags2.bytev = 0b10;
        }
        
        if (enc_minor_count == 0)
        {
            enc_minor_count = 4;
            _enc_dn_f = FALSE;
        }
    }
    else if ((enc_capture == 0b0010)||(enc_capture == 0b0100)||(enc_capture == 0b1011)||(enc_capture == 0b1101))
    {
        //Increment count
        enc_minor_count++;
        
        //Process hysteresis
        if ((enc_minor_count == 6) && (!_enc_up_f))
        {
            enc_major_count++;
            enc_vflags1.bytev = 0b011;
            enc_vflags2.bytev = 0b01;
        }
        
        if (enc_minor_count == 8)
        {
            enc_minor_count = 4;
            _enc_up_f = FALSE;
        }
    }
}


//Idle with encoder off
void enc_state0(void)
{
    if (PPCTRL_ENC)                     //startup encoder if signaled by framework
    {
        IOCBP = ENC_IOCBP_INIT;         //set IOC bidirectional
        IOCBN = ENC_IOCBN_INIT;
        enc_timer = 1;                  //allow input to settle for at least 2 ms

        //initialize all regs and flags
        enc_minor_count = 4;            //minor counter
        enc_major_count=0;              //volatile
        enc_flags.bytev = 0;            //clear hysteresis flags
        enc_vflags1.bytev = 0;          //clear volatile flags
        enc_vflags2.bytev = 0;          //clear hysteretic flags
        encoder_count=0;                //userspace accumulator

        PPSTAT_ENC = TRUE;              //indicate on
        enc_state++;
    }
}

//startup encoder
void enc_state1(void)
{
    uint_fast8_t portb_cap = PORTB;     //first capture of port b
                                        //  to avoid false count
    
    enc_capture = (portb_cap & 0b0101);
    IOCBF = 0;                          //clear interrupt flags
    IOCIE = 1;                          //enable interrupts
    
    enc_state++;
}

//Normal Run State
void enc_state2(void)
{
    if (!PPCTRL_ENC)                    //shutdown encoder if signaled by framework
    {
        IOCIE = 0;                      //disable interrupts
        IOCBP = 0;                      //clear triggers
        IOCBN = 0;
        enc_state=0;                    //restart at state 0
        PPSTAT_ENC=0;                   //signal encoder off
        return;
    }
    
    if (!enc_ready_vf)
        return;
    
    //critical section
    
    IOCIE = 0;                          //disable interrupts
    encoder_count += enc_major_count;
    enc_flags = enc_vflags1;
    enc_major_count = 0;
    enc_vflags1.bytev = 0;
    IOCIE = 1;                          //enable interrupts
    
    //end critical section

    //Callbacks to the main app
    
    onEncoderChange(encoder_count);
    
    if (enc_up_f)
        onEncoderUp();
    else if (enc_dn_f)
        onEncoderDn();
}

void enc_poll(void)
{
    enc_flags.bytev = 0;

    static void (*state_function[])(void)=
        {enc_state0, enc_state1, enc_state2};   

    if (enc_timer)
    {
        if (TC2ms) enc_timer--;
        return;
    }
    else
        state_function[enc_state]();
}
 

WBahn

Joined Mar 31, 2012
30,062
C:
if ((enc_capture == 0b0010)||(enc_capture == 0b0100)||(enc_capture == 0b1011)||(enc_capture == 0b1101))
I don't like this solution because the execution time changes depending upon the state of the encoder.
The execution time is so variable because of guaranteed short-circuit evaluation associated with logical-OR and logical-AND.

But bitwise operators do not short-circuit.

See what happens if you use

C:
if ((enc_capture == 0b0010)|(enc_capture == 0b0100)|(enc_capture == 0b1011)|(enc_capture == 0b1101))
Another thing that would even out the timing would be to not use an if-else, but rather use two independent if statements. That way both expressions are always evaluated.
 

Thread Starter

joeyd999

Joined Jun 6, 2011
5,283
The execution time is so variable because of guaranteed short-circuit evaluation associated with logical-OR and logical-AND.
Thanks, @WBahn.

What I failed to mention is that -- as the code in question is running inside an interrupt handler -- average execution time is far more important than consistent execution time. So short-circuiting the evaluation of the conditional is a benefit.

In .asm, I can get both fast case selection and consistent execution time regardless of the value of the state variable:

Code:
    cjump    _enclst

    bra    eina        ;no action
    bra    eidec        ;decrement
    bra    eiinc        ;increment
    bra     eina        ;no action

    bra    eiinc        ;increment
    bra     eina        ;no action
    bra     eina        ;no action
    bra    eidec        ;decrement

    bra    eidec        ;decrement
    bra     eina        ;no action
    bra     eina        ;no action
    bra    eiinc        ;increment

    bra    eina        ;no action
    bra    eiinc        ;increment
    bra    eidec        ;decrement
    bra     eina        ;no action
Each selection above requires exactly 11 instruction cycles before the selected state code gets executed.

This is the kind of performance I'd like to (need to!) duplicate in C.
 

ApacheKid

Joined Jan 12, 2015
1,610
You raise an interesting point, something else to add an MCU language wishlist and that's constant execution time. It would be helpful if we could somehow indicate that some code was to have a constant execution time, then let the code generator analyze the code and apply that as a transformation.

This could be done at a function level as an attribute on a function definition with a simple keyword like "tight" or "steady" or perhaps as a block delineator:

Code:
while (<blah blah blah>) steady
{


}
This is also a perfect example where being able to add a new keyword without breaking existing code, show itself to be a huge asset.

I think you problem does indicate a serious drawback of C when used for hardware programming and that is there is just now way to convey that the generated code should have all execution paths take the same time. Of course the execution paths would not all be the fastest but they would all be identical perhaps to the clock cycle or something, but doing that manually is frankly ridiculous and also error prone, this is the kind of problem that computers were invented for!
 

ApacheKid

Joined Jan 12, 2015
1,610
An array of function pointers could give you constant time:

Code:
encoder_handlers[1]  = do_decrement_steps;
encoder_handlers[7]  = do_decrement_steps;
encoder_handlers[8]  = do_decrement_steps;
encoder_handlers[14] = do_decrement_steps;

encoder_handlers[2]  = do_increment_steps;
encoder_handlers[4]  = do_increment_steps;
encoder_handlers[11] = do_increment_steps;
encoder_handlers[13] = do_increment_steps;
Then inside the handler:

Code:
void __interrupt(irq(IRQ_IOC), base(IVT_Base)) IOC_ISR(void)
{
    IOCBF = 0;                          //clear interrupt flags

    enc_capture = ((enc_capture << 1) & 0b1010) | (ENCPORT & 0b0101);

    encoder_handler[enc_capture](arguments);
}
Because this is expected to always be a constant time, you could even create a structure that contains all your constant time functions across your entire code base. A global structure where each member is one of your constant time functions, then the code could be:

Code:
void __interrupt(irq(IRQ_IOC), base(IVT_Base)) IOC_ISR(void)
{
    IOCBF = 0;                          //clear interrupt flags

    enc_capture = ((enc_capture << 1) & 0b1010) | (ENCPORT & 0b0101);

    ConstantTimeFunctions.encoder_handler[enc_capture](arguments);
}
Anyone maintaining that code will then be aware that this is a constant time function and must always be constant time. C does not have namespaces, I do this a lot in some nRF24 code I've bee writing so I can write code that looks like this:

Code:
    nrf24_package.GetRegister.ALL_REGISTERS(&device, &everything_before, &status);
  
    // Force all register into their hardware reset state.
  
    nrf24_package.DeviceControl.PowerOnReset(&device);
  
    // Snapshot all regsiters
  
    nrf24_package.GetRegister.ALL_REGISTERS(&device, &everything_after, &status);
It also seems that in cases where enc_capture is none of the eight values, then the handler does nothing, must it also take the same constant time in the cases too? because your implementation does not do that, it takes almost no time in those cases.

For these to also ensure constancy you could create a third "do nothing" function that takes the same time. One way to do that is to have the dummy function do the same steps but with dummy, unused variables, a waste of time but a waste of time on certain execution paths is unavoidable if one seeks identical execution time on all possible paths.

Code:
encoder_handlers[0]  = do_nothing_steps;
encoder_handlers[1]  = do_decrement_steps;
encoder_handlers[2]  = do_increment_steps;
encoder_handlers[3]  = do_nothing_steps;
encoder_handlers[4]  = do_increment_steps;
encoder_handlers[5]  = do_nothing_steps;
encoder_handlers[6]  = do_nothing_steps;
encoder_handlers[7]  = do_decrement_steps;
encoder_handlers[8]  = do_decrement_steps;

// and so on...
So in every conceivable case that leads to that interrupt handler running, you can be confident it will always, always take the same time to handle.
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,610
I don't think that's part of any C standard, at least I've never seen it used. Sorry to bring up PL/I again but that language had a concept of label variables:

Code:
dcl my_place label;

goto attempt(I);

attempt(0):

attempt(1):

attempt(2):

// also

my_place = attempt(J);

goto my_place;
This is extremely basic as a programming construct, easy to implement but not to be in C, C really doesn't offer very much at all for the hardware developer with the exception of the bitwise operators.
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,610
Some C implementations offer a "naked" keyword, that can be used on a function definition and eliminates some of the standard stack frame prologue/epilogue operations ordinarily generated for functions, I guess it might be possible to use that keyword on these kinds of function pointer functions but it is all very fiddly and non standard and vendor specific etc etc.
 

ApacheKid

Joined Jan 12, 2015
1,610
Changed thread title to reflect content.

"Lowering One's Expectations": and they started off low to begin with.
It seems to me that many of the things you need to do could readily be supported by a high level machine oriented language, but no such language exists really, C might be the only game in town and in many respects, as we've seen, often amounts to trying to use a keyboard with boxing gloves one.
 

JohnInTX

Joined Jun 26, 2012
4,787
"Lowering One's Expectations": and they started off low to begin with.
LOL.

While pondering a little personal battery powered project this conversation prompted the realization that the tightness afforded by .ASM vs XC8 means that for equivalent functionality there will likely be many fewer instructions to execute between sleeps and maybe at a lower clock rate to boot. Less power consumed.

Good thread!
 

Thread Starter

joeyd999

Joined Jun 6, 2011
5,283
LOL.

While pondering a little personal battery powered project this conversation prompted the realization that the tightness afforded by .ASM vs XC8 means that for equivalent functionality there will likely be many fewer instructions to execute between sleeps and maybe at a lower clock rate to boot. Less power consumed.

Good thread!
That's the point! The only code I am posting are "auxilliary" functions. I'm not showing the meat of what's really going on underneath.

Forget sleep! I run a MCU at 50 to 100% instruction cycle utilization. If it's less, I can cut the clock speed by 1/2 and use 1/2 the power. If it's more (the default with C it would seem), I've got to (at minimum) double the clock speed and consume respectively more power.

Also, I control some very intricate hardware -- normally firmware-bound sensors that require nS timing across multiple I/Os for best marketable performance.

Luckily, the newer PIC hardware have a lot of advanced peripherals that can take some of this over. It is for this reason alone that I am taking a shot at C. Otherwise, MPLAB 8.92 is still just fine for me (on WinXP!).
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,610
That's the point! The only code I am posting are "auxilliary" functions. I'm not showing the meat of what's really going on underneath.

Forget sleep! I run a MCU at 50 to 100% instruction cycle utilization. If it's less, I can cut the clock speed by 1/2 and use 1/2 the power. If it's more (the default with C it would seem), I've got to (at minimum) double the clock speed and consume respectively more power.

Also, I control some very intricate hardware -- normally firmware-bound sensors that require nS timing across multiple I/Os for best marketable performance.

Luckily, the newer PIC hardware have a lot of advanced peripherals that can take some of this over. It is for this reason alone that I am taking a shot at C. Otherwise, MPLAB 8.92 is still just fine for me (on WinXP!).
Isn't the timing strategy you're using based on assumptions? Like isn't it the case that code in FLASH memory can encounter wait states that can make NOP (and other) instructions, actually take an unpredictable time to execute?
 

Thread Starter

joeyd999

Joined Jun 6, 2011
5,283
Nothing better than "goto" to efficiently navigate the multi-laned, interleaved highway of life:

C:
        if (hold_timer) goto kkhold;
        if (rep_timer) goto kkrep;
        if (key_change.bytev)
        {
            hold_timer = HOLD_TIMER_INIT;
            goto kexitr;
        }
        else goto kexit;
    
    kkhold:
        if (!key_down.bytev) goto kkpr;
        if (--hold_timer) goto kexit;
        key_ph.bytev = key_down.bytev;
        goto kkrep1;
    
    kkrep:
        if (!key_down.bytev) goto kexitr;
        if (--rep_timer) goto kexit;
    
    kkrep1:
        key_rpt.bytev = key_down.bytev;
        rep_timer = REP_TIMER_INIT;
        goto kexit;

    kkpr:
        key_pr.bytev = key_last.bytev;
            
    kexitpr:
        hold_timer = 0;
    
    kexitr:
        rep_timer  = 0;
    
    kexit:
        key_rel.bytev = ~key_down.bytev & key_last.bytev;
        key_last.bytev = key_down.bytev;
 
Top