Idealized microcontroller programming language - MPL

drjohsmith

Joined Dec 13, 2021
541
The title is informal, merely an attempt to convey the question - what could/should a programming language "look like" if it were being designed from scratch where the primary area of application is writing software for MCUs both "low end" and "high end".

This covers questions about grammar and semantics as well as specifics about data types and arithmetic as well as interop with other languages, preprocessor capabilities and any number of things deemed relevant by those who write for such devices.

Some interesting things have come up recently in my research, for example fixed point arithmetic is mentioned a lot in posts and blogs and being very helpful in systems with no floating point hardware or systems that work with DSPs.
Ta.
In read it as a hardware language , such as vhdl or verilog. Hence my confusion.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
I did some research recently and this led me to review completely how developers write numeric literals in a programming language. This includes how we express fixed and floating point numbers in decimal, binary and hexadecimal as well as octal. This includes the presence of separator characters to improve readability.

After some work, this went well, I have the lexical analysis working for the following code fragment:

Code:
proc initialize (count)
{
   arg count bin(8,2); // fixed point binary, 6.2 format
   dcl register_mask bin(24,8);
   dcl temporary dec(15,3); // fixed point decimal, 12.3 format

   count = 110011.01B;

   temporary = 23 000 000 000.25;

   register_mask = 2F03 D7F8.2:H;

   // Or equivalently

   register_mask = 10 1111 0000 0011 1101 0111 1111 1000.00100000:b;

   // base indicators are : followed by b, B, o, O, d, D or h, H - the decimal indicator is optional.

}
The separator can be a underscore or - as in this example - a space. Fixed point binary and hexadecimal comes up rather a lot in some problem domains and C22 for example has some support for these bases and formats, including floating point binary/hexadecimal notation.

What I've tried to do is devise a way to express these that is unambiguous and symmetrical irrespective of the number base. There's no need therefore for a leading "0x" prefix or "0b" or the obligatory leading "0" to indicate octal or other such encumbrances.

The base is clear from the <base-indicator> and if there isn't one, then the number is regarded as base 10.

The use of spaces raised a subtle challenge and that is the recognition of certain identifiers cannot be done unless the lexical analyzer include a stack, something ordinary REGEX and most Lexing tools don't support (although I suspect ANTLR does). I suspect that if one tried to incorporate this notation in C or C++ it would likely become unparseable.

Anyway, that's resolved now and I can robustly tokenize the above text correctly with no problems.
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
I've been discussing this in some other forums, always interested to hear the views of other developers and engineers.

Some things are starting to take shape now, here's some of the ideas that are being given solid consideration:

1. Be able to read UTF-8 source files.
2. Allow flexible numeric literals (as mentioned above)
3. Dispense with both braces as block delimiters and semicolons as statement separators (see Julia language)
4. Support a coroutine keyword and interrupt as a procedure attribute.
5. Provide operators for arithmetic/logical shifting and rotates.
6. Support several specific Unicode code-pages for more exotic identifier naming.
7. Support rich fixed point decimal/binary types.
8. Support bit data types and bit strings.
9. Consider leveraging LLVM for backend code generation.
10. Consider rich preprocessing (see Zig language)
11. Leverage the lack of reserved words to provide support for other human languages (Spanish keywords for example)

The latter point is very interesting, it is basically a no-brainer here, it is just a matter of specifying a language code in the source file or something and then the tokenizer will simply use an alternate token definition file. This question of the reliance on English for programming languages does come up in several areas, notably in teaching where younger people can benefit from seeing terms in their own language.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
I've now established that Antlr is capable of handling a new language with the grammar I'm looking at. That tool is very powerful and generates a solid lexer and parser for me, just from the grammar. I'll post examples later today.
 

xox

Joined Sep 8, 2017
794
I've now established that Antlr is capable of handling a new language with the grammar I'm looking at. That tool is very powerful and generates a solid lexer and parser for me, just from the grammar. I'll post examples later today.
Antlr looks like a bit of a beast! Of course you could compile the spec on any machine. (I am assuming it can produce a pure C implementation?)

So is this still a strictly-academic exercise, or do you have plans to make this into a real programming language?
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
Antlr looks like a bit of a beast! Of course you could compile the spec on any machine. (I am assuming it can produce a pure C implementation?)

So is this still a strictly-academic exercise, or do you have plans to make this into a real programming language?
It's still an academic exercise but my confidence is rising gradually!

Thankfully my prior experiencing implementing a pretty solid PL/I compiler some years ago, gives me a bit of an edge here. I'm very familiar with the problem domain (though not academically as much as I'd like). I was getting frustrated working with C when programming STM boards, admittedly I'm very new to that world but understand enough about software to see that the language is like a pair of handcuffs in many ways, yes it works, yes it can do the job, but it's primitive in many ways and now there's a bewildering array of versions, standards, extensions etc etc.

The C ecosystem is frankly not something anyone would ever have designed, it has simply evolved in various directions as the needs of users changed so I wanted to "clear the whiteboard" and see how a new language might/could look with no assumptions about starting with a C like grammar.

PL/I is IMHO a very serious contender for the base grammar for two reasons 1) No reserved words, adding new keywords over time never ever can break backward compatibility. 2) It easily contends with C and exceeds it in numerous ways with support for bit data types, strings, multiple storage attributes, nested procedures, computed goto, much cleaner syntax than C and so on.

So as you know I resurrected some unrelated recent C# project that was exploring an alternative syntax for C#, and started playing with that afresh. I got deeply immersed in the lexer/parser (hand coded) when I forced myself to stop and seriously look at Antler, the only way to get an idea of its suitability was to start working with it and now I'm glad that I did, it has bee a HUGE help.

Antlr

I read a lot about it, asked a bunch of questions then just said "screw it, lets spend the time and see". It is very powerful, the effort it saves is huge, it generates lexing and parsing code (class files) in a choice of languages straight from a single grammar file. I'm generating C# but testing it with an Antlr test tool that uses generated Java classes.

I can edit the grammar, then gen a set of Java and C# classes that represent the lexing/parsing, I have a crude C# app that leverages these classes, much as a compiler would.

The Antlr test tool consumes the grammar file, the compiled Java files and a source input file, it then attempts to parse that file and renders a tree, the parse tree.

This simple source:

Code:
/* A simple test source */

counter = 100;

dcl counter dec based(counter);
dcl counter binary;
declare counter binary;
Generates this in the tool:

1671209628160.png

This is slick, one can tweak the grammar, incrementally enrich it and test as one goes, so the grammar file grows slowly and gets richer and richer as one works.

This was frankly close to impossible before, hand crafting the lexer and parser is slow work, this tool eliminates that and I can focus in the linguistics of the work.

So, not sure yet that I will apply myself to producing a serious implementation, but at least I can see that it is feasible, I've done it before without any tools for front end or back end, yet today we have Antlr for the front and LLVM for the back, each of these is hugely capable.

The grammar has no reserved words, so this parses fine:

Code:
/* A simple test source */

counter = 100;

/* declare some variables named the same as keywords */

dcl dcl dec based(counter);  /* dcl is a keyword ! */
dcl counter binary;
declare goto binary; /* goto is a keyword ! */

goto call; /* call is a keyword ! */

call declare(); /* declare is a keyword */
1671210351436.png
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
The grammar for this potential language is developing steadily, Antlr makes it pretty easy to experiment too, one can tweak some rule to see if some idea is feasible, consistent with the rest of the grammar and just retest very easily.

Here's a sample test source file that parses without error:

Code:
procedure main(x)

    def mystruct  /* stuff like static and so on illegal for struct members */
        range bin(15),
        fixed string(32),  /* trailing comma optional. */
    end;

    dcl base_address pointer;


    // dcl Popcount intrinsic; // pos


    dcl counter(1:10,1:10,1:10) binary;
    dcl range fixed float dec (12,4);
    dcl volume fixed dec (if,then);
    dcl name string(256) varying;
    dcl mask(32) bit(8);

     dcl stuff mystruct;

    /* declare some variables named the same as keywords */

    dcl dcl dec based(counter);  /* dcl is a keyword ! */

    dcl realm binary;

    declare goto binary; /* goto is a keyword ! */

    base_address = 0000 1F00 C000 0000:h ;

    if count > 100 then
       call do_something();
    elif count < 0 then
       call do_something_else();
       call do_whatever();
    elif count > max then
       call where_done();
       if we_are_still_here then
          return;
       elif we_did_not_crash then
          call crash();
       else
          call stop();
       end;
    else
       return;
    end;


    call reset;

retry :

    counter = 100;

    goto call; /* call is a keyword ! */

    loop
        call sleep;
    end;

    loop until (a > b)
        call sleep;
    end;

    loop while (a > b)
        call sleep;
    end;

    loop while (a > b) until (eof)
        call sleep;
    end;

    call declare(); /* declare is a keyword */

    call declare (1,2,3);

    return (123);

    if rate = 100:b then
       call reboot(0);
       go to end;
    else
       depth = 44;
       end = then + else;
       return (123:o);
    end;

    if count > 100 then
       call do_something();
    elif count < 0 then
       call do_something_else();
    elif count > max then
       call where_done();
    end;

    proc reboot(a,b,c) coroutine returns string(32)

        arg a(*,*) binary;
        argument b decimal;

    end;

end;
This is based on the PL/I subset G but I'm revising it as I go, you can see that "elif" has been added to the "if" statement grammar. I also added support for numeric literals that can contain embedded space or underscore and this works like a dream, parses correctly with a trailing base designator for octal, hex, binary and (optionally) decimal.

The '*' character has a meaning closer to wildcard, no association with pointers. In the places you see it is because PL/I lets you pass arrays with differing bounds into other code, that code declares the array with '*' bounds meaning the actual runtime bounds are those of the passed arg.

Block delimiters are gone too, they are 'do;' and 'end;' in PL/I and '{' and '}' in C, these are replaced by each block oriented statement having it's own 'end' now.
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
FYI - The language has a name now, it has been named "Imperium" after the Latin term for "control", source files are suffixed .ipl - Imperium Programming Language.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
Should anyone be curious the Github repository for this work is here - imperium repository.

The formal grammar and tooling now facilitate support for multiple cultures, it is trivial to now create keyword sets for other cultures providing tokens can be defined for them in that language. There's obviously English and also recently Dutch (kindly contributed by the lead engineer of the Citrine programming language), French is something I expect to add very soon too.

I'm also very interested in some of the stuff from earlier "Wirthian" languages, specifically the concept of "SETS" as seen in Oberon and earlier languages that have recently been brought to my attention.

Oberon itself is very interesting, here's a 2015 in-depth technical article from Xilinx about an FPGA implementation of Oberon by Nicklaus Wirth for those interested (jump to page 30).
 
Last edited:

nsaspook

Joined Aug 27, 2009
10,668
Should anyone be curious the Github repository for this work is here - imperium repository.

The formal grammar and tooling now facilitate support for multiple cultures, it is trivial to now create keyword sets for other cultures providing tokens can be defined for them in that language. There's obviously English and also recently Dutch (kindly contributed by the lead engineer of the Citrine programming language), French is something I expect to add very soon too.

I'm also very interested in some of the stuff from earlier "Wirthian" languages, specifically the concept of "SETS" as seen in Oberon and earlier languages that have recently been brought to my attention.

Oberon itself is very interesting, here's a 2015 in-depth technical article from Xilinx about an FPGA implementation of Oberon by Nicklaus Wirth for those interested (jump to page 30).
SETS is a useful but not critical bitmap abstraction. I've used it in Modula-2 in the distant past.

Code:
  (* BIOS variables : These can only be accessed with the 68000 in supervisor
     mode. The Modula-2 language allows you to fix the location of variables *)

  HDBPB     [0472H] : ADDRESS ;       (* hard disk get Bios Parameter Block *)
  HDRWAbs   [0476H] : ADDRESS ;       (* hard disk read/write abs   *)
  HDMediaCh [047EH] : ADDRESS ;       (* hard disk media change     *)
  DriveBits [04C2H] : SET OF [0..31]; (* disk drives present map    *)
...
  INCL(DriveBits,drvnr) ;             (* set new drive A *)
  INCL(DriveBits,drvnr+1) ;           (* set new drive B *)
  INCL(DriveBits,drvnr+2) ;           (* set new drive C *)
  INCL(DriveBits,drvnr+3) ;           (* set new drive D *)
  INCL(DriveBits,drvnr+4) ;           (* set new drive E *)
  INCL(DriveBits,drvnr+5) ;           (* set new drive F *)
Some old Modula-2 code for the Atari-ST
https://raw.githubusercontent.com/nsaspook/vcan/qdrive/MX2/NETWORK.MOD


INCL
PROCEDURE INCL(SetVar : SetType; Element : ElementType);

Includes an element in a SET or PACKEDSET variable. SetVar must be a variable of a set type, and Element must be an expression resulting in a value of the element type of the set. The specified element is added to the value of the set variable.
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
I do like the notation, very compact, setting a bit "on" is adding its number to a "set" whereas setting it off is to remove from the set, rather neat.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
The grammar has started to get very solid now, multiple cultures are now passing tests well, each of these parses fine and they generate an identical (logical) parse tree:

Code:
procedure English (X)

    declare counter binary(15);

    if counter > 0 then
       call retour;
    else
       return;
    end;

    loop while (a > b)
       go to place;
    end;

end;
and

Code:
procédé French (X)

    déclarer counter binaire(15);

    si counter > 0 ensuite
       appeler retour;
    autre
       retour;
    fin;

    boucle tandis que (a > b)
       aller à place;
    fin;

fin;
We simply "tell" the parser what culture ("fr" or "en" in these examples) to use and it just does its stuff. Note how keywords can consist of multiple terms too, so one term for WHILE in English "while" yet two terms for WHILE in French "tandis que". This is all defined in a dictionary JSON file that I use to auto generate part of the Antlr grammar file.

If you're wondering why I'm doing this I'll explain.

The nature of the original grammar (PL/I) is that it has no reserved words any word (even a keyword) can be used as an identifier. This is primarily to allow new language keywords to be added over time and never break backward compatibility. Well that's achieved, it is an inherent aspect of the grammar's core structure, designed to have this property.

However this property also therefore means we can change keywords into another language and also never break backward compatibility! That's it, its easy to support multiple cultures because its trivial to take English code and convert its keywords into French, Dutch, German etc and be 100% confident that it will compile into exactly the same result.

Note the English fragment above, it has "call retour;", well "retour" is "return" in French, that is the English code (perhaps unwittingly) used a French keyword as an identifier for some callable procedure. But it matters not, the grammar is unaffected by that!

I can literally feed code written in any (supported) language and automatically convert it into another (supported) language, its trivial! Try doing this in C, C++, Ada, Rust or Java !
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
Here is some test source code used to help me test the grammar of the new language. This is just to give an idea of how code would look once one started to use it. You'll see a bunch of things in here, using @ for labels and singly/doubly quoted string literals that can contain quoted substrings and span multiple lines, also you'll see subscripted labels too and (optional) keywords following some "end" statements. Most of the comments are fictitious, just to give the code a real world appearance.

The "$" symbol is just used here because VAX VMS and Stratus VOS (on which most system software was written in PL/I) used dollar prefixes to designate operating system entry points, just a convention that used.

This code parses correctly using the latest grammar definition:

Code:
procedure main (arg);

    /* This is sample code */

    dcl name string(32);
    dcl I bin(15);
    dcl J bin(15);
    dcl S bin(31);
    dcl A(1024) bit(8); // 1 KByte
    dcl root ptr;
    dcl status bin(15);

    type_a = "string literlas can be single quoted like this";

    type_b = ""or double quoted like this, with embedded single quoted text "like this" see?"";

    name = ""demonstration_network_controller"" ;

    title = ""This is the "best in class"
  
    language system"" ;

    //call sys$announce_device(name, status);

    if status ~= 0 then
       return;
    end;

    call sys$install_platform(title);

    root = sys$get_system_bootroot(name);

    call sys$validate_authority(root);

    goto cycle_loop;

@cycle_loop

    I = 100;

    if I > 0 then
       goto cycle_loop;
    end;
  
    I = sys$get_updated_count(J);

@setpoint(0)

    loop while (I >= J)
       I = get_updated_count(J);
    end loop;

    if I = 123 & J = 321 then
       I = 0;
       goto setpoint(I); // never pass non-zero !!
    elif J + I > J * I then
       goto cycle_loop;
    elif J = sqrt(100) then
       return;
    end if;

    call get_latest_faulting_stack(S);

    if S ~= 0 then
       call sys$stack_crawler(S);
    end;

    /* set S to the sentinel for the next operation */

    S = F5D3 03A2:H; // we need to ensure this sentinel is not allocated to any other device types

    call sys$reinitialize_firmware_table(S);

    call sys$sleep (1024);

    goto cycle_loop ;

/* Crawl the stack chain looking for the designated handler */
procedure sys$stack_crawler (handler_id) recursive;

    arg handler_id bin(31);

end;

/* Only call this if we kmow there's no active services still running */
procedure sys$reinitialize_firmware_table(table);

    arg table bin(31);

end proc;

procedure sys$timer_callback (context) interrupt;

    arg context pointer;

end proc;

end proc;
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
I've drafted a document that compares language features of C, PL/I and the hypothetical IPL, where the features are conducive to writing an operating system in that language, for those interested jut visit the page in GitHub.

This arose from discussions in a different thread. The impetus was my view that a language good for MCU work must also be suitable for developing an operating system, any language claiming to be "good" at hardware programming must also be good for OS development.

To my knowledge few language have a track record of delivering real operating systems, PL/I and C are the most well known I think.
 
Last edited:

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
The concept of a namespace has now been added to the grammar. A namespace can contain code or other namespaces and a single compilation unit can contain multiple (that is, not nested) namespaces.

Neither PL/I or C had this feature and it's a huge help in organizing code.

Code:
namespace test;

    dcl root_ptr pointer;

    procedure outer (arg);

       dcl counter bin(15);
       dcl window bin(31);

        if counter = 123 then
           return;
        end;

    end procedure;

    procedure rerun (arg);

        call s$system.memory.allocate_virgin(10 0000:H); // 1,048,576 in dec FYI

    end procedure;

end;

namespace utility.sundry;
    
    // an empty namespace

end;

namespace s$system;

    namespace memory;

        procedure allocate_virgin(size);

            arg size bin(31);
        end;
    end;
end;
Another useful concept that seems much better to have than not, is comparison operator "chaining":

Code:
if a > b > c >= d < e then
   // do stuff
This is logically equivalent to:

Code:
if a > b && b > c && c>= d && d < e then
   // do stuff
Both Perl and Raku do this and I have to say it is pretty neat, although that syntax is already legal in C and other languages and PL/I even, it is almost always illegal in terms of types due the semantics attached to it. Even the C# language team are looking to do this now too.

Finally, I'm starting to look at the world of "enum" PL/I never had this feature so if anyone has ideas or thoughts to mention them here...
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
what the distinction between a name space and a procedure ?
A namespace is nothing more than a means of defining a naming hierarchy.

Without a namespace a function called reset_device must be called literally as reset_device and creating another function named reset_device (say for a different kind of device) has to be given a different name like reset_other_device.

With a namespace you can give these a prefix and if the prefix is different then the functions can in fact have an identically spelled name, this shows you the principle:

Code:
namespace Hardware
   namespace ADC
      procedure reset_device (device_ptr);
         // code
      end;
   end;

   namespace GPS
      procedure reset_device (device_ptr);
         // code
      end;
   end;
end;

procedure main(device_ptrs);

   dcl device_ptrs(2) pointer;

   call Hardware.ADC.reset_device(device_ptrs(ADC_DEVICE));

   call Hardware.GPS.reset_devuce(device_ptrs(GPS_DEVICE));

end;
There's really nothing more to them than that, rather simple but hugely helpful, C has no namespace capability and although one can "simulate" it that requires all kinds of boiler plate code and structuring, having them designed into the language is far better, simpler and cheaper.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,082
I started to add the grammar for user defined types, not strictly supported in the original PL/I language but almost, so a small adaption enables this without loss of backward compatibility support.

Anyway, here's an example of how we can create an enum type:

Code:
      type baud_rates enum bit(8),
           first  = 1010 0011:b,
           second = 1110 1101:b,
           third  = 1101 0111:b,
           fourth = 0010 1100:b
      end;
to declare an instance of this, we just write:

Code:
dcl rates as baud_rates;

if rates = baud_rates.second then
   // do whatever
else
   // do something else
end;
 
Top