Idealized microcontroller programming language - MPL

Gorbag

Joined Aug 29, 2020
13
OK so I have a draft initial list of bullet points that a new hardware oriented language could consider based on this thread and a few others:
I would carefully consider which of these actually help address hardware issues (e.g. bit data types) and which are just your own idea of a nice programming language (e.g. braces for block delimiters, lack of reserved words...).

Note that even "pointers" presupposes a particular kind of memory model which may not be correct for all hardware (e.g. content addressable memory).
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
I would carefully consider which of these actually help address hardware issues (e.g. bit data types) and which are just your own idea of a nice programming language (e.g. braces for block delimiters, lack of reserved words...).

Note that even "pointers" presupposes a particular kind of memory model which may not be correct for all hardware (e.g. content addressable memory).
I agree, the list is purely a summary of the kinds of things and issues and ideas that have come up. The absence of reserved words is quite important though as it presents no real challenges and does garuantee the ability to add new keywords and attributes in the future with no backward compatibility concerns.

As for content addressable memory, are you referring to something that we do sometimes encounter in MCU development?

One could perhaps distinguish too between a pointer and an address, where a pointer is opaque, used only as a way to deference some data or structure etc whereas an address could be a little different, less opaque - but as I said, I am mostly thinking aloud with this whole subject.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
Some further things for consideration, some languages have predefined, "built in" functions, functions that are considered "part of" a language. However these are sometimes available only by linking in additional static libraries, some implementations of C generate code for some language functions dynamically, at compile time to avoid library linking.

It strikes me as a preferable to make all language built in functions, be generated at compile time, this way only the functions actually called will contribute code to the final binary.

Furthermore, in keeping with the "no reserved words" goal that facilitates backward compatibility as the language grows, built in functions would use square [ brackets ] to wrap argument lists, this would be the only permitted use case for square brackets.

All other parenthesizing (user function arguments, array subscripting, numeric size attributes etc) can be fully implemented using conventional round ( parentheses ).

The implementation of language functions can then be via pre written source code, "invisible" to the user of the language.

If a user's code contained one of their own functions like this:

Code:
a = popcount(bit_array); // use Tony's popcount function.
then the code would compile and execute without problems, even if a new language version supported a new built in function with the same name:

Code:
a = popcount(bit_array); // use Tony's popcount function.

// use new language function

a = popcount[bit_array]; // this causes no collision...
The use of [ and ] would be illegal for user written code.

A second point also comes up with respect to argument passing. In C all args are passed "by value" (even pointers are) which causes arguments to be copied prior to a function being invoked. This is not as good as passing by reference, when a language passes args by reference then no copies are made. In such a model every argument is passed by its address and there is no need to make that explicit in the invocation.

All that's needed in such a model is a means of sometimes forcing an argument to be copied, in cases where we do not want changes made by a callee to be visible to the caller, this is easy and is done in a few other languages by simply wrapping the argument in parentheses:

Code:
// Here, callee, could make changes to the arg struct
call system_init(setup_parameters);


// Here, callee would only make changes to a copy.
call system_init((setup_parameters));
C# (but not Java) offers additional attributes which are very helpful too, these are the "ref" and "out" attributes, something similar could be adopted in a new language too.
 
Last edited:

xox

Joined Sep 8, 2017
789
Some further things for consideration, some languages have predefined, "built in" functions, functions that are considered "part of" a language. However these are sometimes available only by linking in additional static libraries, some implementations of C generate code for some language functions dynamically, at compile time to avoid library linking.


It strikes me as a preferable to make all language built in functions, be generated at compile time, this way only the functions actually called will contribute code to the final binary.


Furthermore, in keeping with the "no reserved words" goal that facilitates backward compatibility as the language grows, built in functions would use square [ brackets ] to wrap argument lists, this would be the only permitted use case for square brackets.


All other parenthesizing (user function arguments, array subscripting, numeric size attributes etc) can be fully implemented using conventional round ( parentheses ).


The implementation of language functions can then be via pre written source code, "invisible" to the user of the language.


If a user's code contained one of their own functions like this:


Code:
a = popcount(bit_array); // use Tony's popcount function.

then the code would compile and execute without problems, even if a new language version supported a new built in function with the same name:


Code:
a = popcount(bit_array); // use Tony's popcount function.


// use new language function


a = popcount[bit_array]; // this causes no collision...

The use of [ and ] would be illegal for user written code.


A second point also comes up with respect to argument passing. In C all args are passed "by value" (even pointers are) which causes arguments to be copied prior to a function being invoked. This is not as good as passing by reference, when a language passes args by reference then no copies are made. In such a model every argument is passed by its address and there is no need to make that explicit in the invocation.


All that's needed in such a model is a means of sometimes forcing an argument to be copied, in cases where we do not want changes made by a callee to be visible to the caller, this is easy and is done in a few other languages by simply wrapping the argument in parentheses:


Code:
// Here, callee, could make changes to the arg struct

call system_init(setup_parameters);



// Here, callee would only make changes to a copy.

call system_init((setup_parameters));

C# (but not Java) offers additional attributes which are very helpful too, these are the "ref" and "out" attributes, something similar could be adopted in a new language too.


The syntax is a bit problematic though. Kind of hard to verify by sight which is which. Why not use namespaces instead?

Code:
fun popcount(bits) {
// ...
}

bit_array = [0, 1, 0, 0, 1];

a = popcount(bit_array); // default: use Tony's popcount function

a = std.popcount(bit_array); // use the built-in function
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
The syntax is a bit problematic though. Kind of hard to verify by sight which is which. Why not use namespaces instead?

Code:
fun popcount(bits) {
// ...
}

bit_array = [0, 1, 0, 0, 1];

a = popcount(bit_array); // default: use Tony's popcount function

a = std.popcount(bit_array); // use the built-in function
That's certainly possible too, but it does require "std" in that example to be reserved and might tend to lead to cluttered code if it is using numerous built-in functions. I agree though that the square brackets isn't clear really and a little contrived, a special kind of syntax just for this is - with hindsight - a bit ugly.

Namespaces is certainly a way to do it, but could lead to inelegant looking code in expressions:

Code:
a = std.cos(std.sqrt(x) / (std.floor(y) + std.ceil(z)));
But this is hardly any better really:

Code:
a = validate(cos[sqrt[x] / (floor[y] + ceil[z])]);
Its a small but interesting problem. We want to be able to cleanly compile code that might contain a function that was never part the language in v1.0 but has been added to v1.1 and that code compiled cleanly in v1.0.

The problem only arises in expressions, anywhere else the grammar can parse fine with identifiers named after keywords, but expressions are different, different grammar (operator precedence). I don't think PL/I solved this, they likely couldn't add new built-in functions.
 
Last edited:

xox

Joined Sep 8, 2017
789
That's certainly possible too, but it does require "std" in that example to be reserved and might tend to lead to cluttered code if it is using numerous built-in functions. I agree though that the square brackets isn't clear really and a little contrived, a special kind of syntax just for this is - with hindsight - a bit ugly.


Namespaces is certainly a way to do it, but could lead to inelegant looking code in expressions:


Code:
a = std.cos(std.sqrt(x) / (std.floor(y) + std.ceil(z)));

Assuming your library is properly scoped, there wouldn't likely be too many issues.

Code:
let tony = import("tony");
using std;

let [x, y, z] = [23, 101, 13];
let a = cos(tony.sqrt(x) / (floor(y) + ceil(z)));
print(sqrt(z)); // prints 3.6055

using tony;
print(sqrt(z)); // prints 3.605551275463989293119221267470496


But this is hardly any better really:

Code:
a = validate(cos[sqrt[x] / (floor[y] + ceil[z])]);
Its a small but interesting problem. We want to be able to cleanly compile code that might contain a function that was never part the language in v1.0 but has been added to v1.1 and that code compiled cleanly in v1.0.

The problem only arises in expressions, anywhere else the grammar can parse fine with identifiers named after keywords, but expressions are different, different grammar (operator precedence). I don't think PL/I solved this, they likely couldn't add new built-in functions.

Added new functions to the language is always a possibility, but if the language is properly designed in the first place, there won't be many API changes.

My concern with no keywords is just that it sort of nullifies the benefits of efficient tokenizing, which in turn could effect parsing efficiency. An integer token type allows lookup tables to be used by a parsing state-machine. Without keywords, everything is an "identifier". Doesn't make the job impossible of course, just a little more tedious.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
Actually I was wrong about PL/I and name clashes for built in functions. They do solve the problem and they use the attribute keyword "builtin" to declare a builtin function is one is using it.

So in order to use a built-in function the developer must declare it as "builtin" e.g.

Code:
dcl sqrt builtin; // No need for details, the language know arg types, return type and so on.

a = sqrt(2.0);
So the principle is that there can be no clash with older code using a newly introduced builtin, because there's no way that older code could have declared it as builtin.
 

xox

Joined Sep 8, 2017
789
Actually I was wrong about PL/I and name clashes for built in functions. They do solve the problem and they use the attribute keyword "builtin" to declare a builtin function is one is using it.

So in order to use a built-in function the developer must declare it as "builtin" e.g.

Code:
dcl sqrt builtin; // No need for details, the language know arg types, return type and so on.

a = sqrt(2.0);
So the principle is that there can be no clash with older code using a newly introduced builtin, because there's no way that older code could have declared it as builtin.
Precisely.
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
Assuming your library is properly scoped, there wouldn't likely be too many issues.

Code:
let tony = import("tony");
using std;

let [x, y, z] = [23, 101, 13];
let a = cos(tony.sqrt(x) / (floor(y) + ceil(z)));
print(sqrt(z)); // prints 3.6055

using tony;
print(sqrt(z)); // prints 3.605551275463989293119221267470496





Added new functions to the language is always a possibility, but if the language is properly designed in the first place, there won't be many API changes.

My concern with no keywords is just that it sort of nullifies the benefits of efficient tokenizing, which in turn could effect parsing efficiency. An integer token type allows lookup tables to be used by a parsing state-machine. Without keywords, everything is an "identifier". Doesn't make the job impossible of course, just a little more tedious.
I must have expressed myself badly, there are keywords, absolutely there are keywords but they are not reserved. Any word - even an existing keyword - can be used as an identifier (name of a variable, type, function etc) with no problems.

In the parser identifiers are recognized as the source is tokenized and IF they also match an known keyword, a keyword flag is set on for that token. That assists later on in the parsing.

The actual parse is not impacted by this much at all, there's no discernable cost to having no reserved words. Yes it very easy to parse a language that does have reserved keywords but the cost difference likely immeasurable in reality, the C guys could have done this but the did not and their grammar design now prohibits it even if we wanted it, this was - basically - laziness, they wanted a simple. crude, easy parser.

The parser parses statements (or attempts to) sequentially, as it begins the next statement it applies a test like IsThisAnAssigmnentStmt(...). If it is then it parses it as an assignment, if it isn't then it does a keyword match.

Its actually simple but far from obvious when one first thinks about it.

An assignment is recognizable because it must satisfy the grammar rule:

<reference> = <expression> ;

That's the most general rule. Of course a <reference> itself can contain expressions, recursively, like:

table[X].inner[counters.batch].element[GetLocation(Z)] = CalcRoot(table.control.region[X]) ;

and so on and so forth.

But there are a finite number of rules that define a <reference> and they are easy to parse one we know how to parse the expressions (which involves operator precedence parsing).

The parser saves it's state when it calls IsThisAnAssigmnentStmt so that it can then choose whether to attempt the next parse as either an assignment or keyword. The rule list is like this, it has a few more entries and this is a bit simplified but this basically it:

<reference> := <identifier> | <identifier.<reference>> | <identifier[<expression]> | <identifier>(<commalist>) ...

It is very fast indeed to recognize an assignment even a very complex one, remember we're just parsing it (consuming a few tokens), we don't (yet) care if its legal, the names are declared and so on only that it satisfies the syntax of an assignment.

That example assignment above contains just 30 tokens.
 
Last edited:

xox

Joined Sep 8, 2017
789
I must have expressed myself badly, there are keywords, absolutely there are keywords but they are not reserved. Any word - even an existing keyword - can be used as an identifier (name of a variable, type, function etc) with no problems.


In the parser identifiers are recognized as the source is tokenized and IF they also match an known keyword, a keyword flag is set on for that token. That assists later on in the parsing.


The actual parse is not impacted by this much at all, there's no discernable cost to having no reserved words. Yes it very easy to parse a language that does have reserved keywords but the cost difference likely immeasurable in reality, the C guys could have done this but the did not and their grammar design now prohibits it even if we wanted it, this was - basically - laziness, they wanted a simple. crude, easy parser.


The parser parses statements (or attempts to) sequentially, as it begins the next statement it applies a test like IsThisAnAssigmnentStmt(...). If it is then it parses it as an assignment, if it isn't then it does a keyword match.


Its actually simple but far from obvious when one first thinks about it.


An assignment is recognizable because it must satisfy the grammar rule:


<reference> = <expression> ;


That's the most general rule. Of course a <reference> itself can contain expressions, recursively, like:


table[X].inner[counters.batch].element[GetLocation(Z)] = CalcRoot(table.control.region[X]) ;


and so on and so forth.


But there are a finite number of rules that define a <reference> and they are easy to parse one we know how to parse the expressions (which involves operator precedence parsing).


The parser saves it's state when it calls IsThisAnAssigmnentStmt so that it can then choose whether to attempt the next parse as either an assignment or keyword. The rule list is like this, it has a few more entries and this is a bit simplified but this basically it:


<reference> := <identifier> | <identifier.<reference>> | <identifier[<expression]> | <identifier>(<commalist>) ...


It is very fast indeed to recognize an assignment even a very complex one, remember we're just parsing it (consuming a few tokens), we don't (yet) care if its legal, the names are declared and so on only that it satisfies the syntax of an assignment.


That example assignment above contains just 30 tokens.

That makes sense. So you could still use a lookup-table for the start of a production rule for a keyword, then during the validation phase, the token is only queried for it's "identifier" flag. Which could of course be compactly embedded directly within the type information:

Code:
enum {
slot_return, 
slot_while, 
slot_for, 
slot_else, 
//...
mask_id = 0x10, 
type_return = slot_return | mask_id, 
type_while = slot_while | mask_id, 
type_for = slot_for | mask_id, 
type_else = slot_else | mask_id, 
//...
};
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
That makes sense. So you could still use a lookup-table for the start of a production rule for a keyword, then during the validation phase, the token is only queried for it's "identifier" flag. Which could of course be compactly embedded directly within the type information:

Code:
enum {
slot_return,
slot_while,
slot_for,
slot_else,
//...
mask_id = 0x10,
type_return = slot_return | mask_id,
type_while = slot_while | mask_id,
type_for = slot_for | mask_id,
type_else = slot_else | mask_id,
//...
};
This is the token C type. Pretty simple, the parser as you know, simply sees a stream of these structs, never sees the source file.

https://github.com/Steadsoft/PLI-2000/blob/main/Compiler Source/TOKEN.H
 

drjohsmith

Joined Dec 13, 2021
506
Uhm.
No key words . I might be missing something here.
Sort of worries me in terms of long term maintainability
How do I know that any word will do the same in the future ?

When I type if then else , I know it will do what I want.

I also worry about HDL usage.
The current round of HDL languages, are predictable .
HDL differes fundamentally from C type sw in two very important ways .
a. Hardware is highly multi threaded . If you have 20 million registers clocking , that's effectively 20 million threads .
b. HDL need to encode the timing. If I design a process that I want to give an answer every 2ns , with a delay of say 100ns , then the language needs to understand that .
In say verilog , this is inherent in the register design. Timing I also need to tell the tool how fast I want the Io to be. No point in a great design that can not use the properties of the chip , such as clock gating , iob registers, and makes the design un able to be placed ..
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
Uhm.
No key words . I might be missing something here.
Sort of worries me in terms of long term maintainability
How do I know that any word will do the same in the future ?
I never advocated no keywords, that's something that I might possibly have mis-conveyed but I never actually advocated.

What I advocated was no reserved words, that's different. It means that there are no limits on what a programmer can name things and that means that we can introduce a new keyword that might be the same as some variable name in his code and the code will still compile under the latest compiler.

It's a simple idea but one that is lost as soon as a grammar like C is adopted.

When I type if then else , I know it will do what I want.
Right, and it will always do the same thing. Creating variables named "if" or "then" or "else" has no effect on how if-then-else is processed by the compiler, it is able to distinguish 100% between keywords and variable names at all times.

In some programming language someone could create types or variables named "yield" or "await" for example. Now if that language was C or C++ or Rust, those languages can never add a new keyword "yield" or "await" because they can no longer compile code that was previously valid and compiled fine. But in a language like PL/I, it is trivial because the grammar is designed to prevent the two ever being confused.

Languages with reserved words face a big problem, they must decide on all of their keywords from the outset and tell us what they are and thereafter never add a new keyword. That's frankly ridiculous in the grand scheme of things because one cannot foresee the future.

I also worry about HDL usage.
The current round of HDL languages, are predictable .
HDL differes fundamentally from C type sw in two very important ways .
a. Hardware is highly multi threaded . If you have 20 million registers clocking , that's effectively 20 million threads .
b. HDL need to encode the timing. If I design a process that I want to give an answer every 2ns , with a delay of say 100ns , then the language needs to understand that .
In say verilog , this is inherent in the register design. Timing I also need to tell the tool how fast I want the Io to be. No point in a great design that can not use the properties of the chip , such as clock gating , iob registers, and makes the design un able to be placed ..
Such considerations are fine, but nothing to do with grammar as such.
 

xox

Joined Sep 8, 2017
789
I never advocated no keywords, that's something that I might possibly have mis-conveyed but I never actually advocated.


What I advocated was no reserved words, that's different. It means that there are no limits on what a programmer can name things and that means that we can introduce a new keyword that might be the same as some variable name in his code and the code will still compile under the latest compiler.


It's a simple idea but one that is lost as soon as a grammar like C is adopted.




Right, and it will always do the same thing. Creating variables named "if" or "then" or "else" has no effect on how if-then-else is processed by the compiler, it is able to distinguish 100% between keywords and variable names at all times.


In some programming language someone could create types or variables named "yield" or "await" for example. Now if that language was C or C++ or Rust, those languages can never add a new keyword "yield" or "await" because they can no longer compile code that was previously valid and compiled fine. But in a language like PL/I, it is trivial because the grammar is designed to prevent the two ever being confused.


Languages with reserved words face a big problem, they must decide on all of their keywords from the outset and tell us what they are and thereafter never add a new keyword. That's frankly ridiculous in the grand scheme of things because one cannot foresee the future.




Such considerations are fine, but nothing to do with grammar as such.

All of this talk makes me want to start a new project! :p
Seriously though, how about a programming language that could be both compiled AND interpreted? No trivial task, for sure, but if it were to be done (even in a somewhat limited way) it might prove be useful to SOMEONE.

Personally, I would love to work in an environment that provides REAL SECURITY to both users and developers. Because currently there are really no offerings that can guarantee such a thing.

Actually, I attempted to do just that with a recent Javascript project. But then pretty much everyone in THAT community that I reached out to has basically told me that my suggested "solution" will likely not be 100% portable due to an inherent lack of restrictions on JS implementations. (It does however seem to work perfectly in Node-js, fingers crossed!) What I would like is a language that can ensure that third-party libraries can only do what the user allows.

And both the compiled and interpreted versions should allow for dynamic loading (aka: live scripting). Maybe the whole thing could be written in C, and then either generate/transpile-to C (easy) OR assembly (hard, but would result in much faster code in the long run). Either way, an interpreter should be available to run any external code in a sandbox, if necessary.
 

drjohsmith

Joined Dec 13, 2021
506
All of this talk makes me want to start a new project! :p
Seriously though, how about a programming language that could be both compiled AND interpreted? No trivial task, for sure, but if it were to be done (even in a somewhat limited way) it might prove be useful to SOMEONE.

Personally, I would love to work in an environment that provides REAL SECURITY to both users and developers. Because currently there are really no offerings that can guarantee such a thing.

Actually, I attempted to do just that with a recent Javascript project. But then pretty much everyone in THAT community that I reached out to has basically told me that my suggested "solution" will likely not be 100% portable due to an inherent lack of restrictions on JS implementations. (It does however seem to work perfectly in Node-js, fingers crossed!) What I would like is a language that can ensure that third-party libraries can only do what the user allows.

And both the compiled and interpreted versions should allow for dynamic loading (aka: live scripting). Maybe the whole thing could be written in C, and then either generate/transpile-to C (easy) OR assembly (hard, but would result in much faster code in the long run). Either way, an interpreter should be available to run any external code in a sandbox, if necessary.
A language that can be compiled and interpreted
BASIC
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
OK, name an implementation that does both of those across all (or at least most) platforms. I don't think one exists.
I think there is no longer a crystal clear distinction these days, with stuff like IL and JIT and so on.

I guess one could look at language features that are hard to support in an interpreter and hard to support in a compiler and see if there's a decent middleground.

I've never written an interpreter, they do - obviously - need a runtime environment so that's going to consume resources.

In an interpreter one can easily generate source code and execute it, that's much harder in a compiled language ordinarily.

Tell me, what's the appeal of the interpreter in your situation?
 

xox

Joined Sep 8, 2017
789
I think there is no longer a crystal clear distinction these days, with stuff like IL and JIT and so on.


I guess one could look at language features that are hard to support in an interpreter and hard to support in a compiler and see if there's a decent middleground.


I've never written an interpreter, they do - obviously - need a runtime environment so that's going to consume resources.


In an interpreter one can easily generate source code and execute it, that's much harder in a compiled language ordinarily.


Tell me, what's the appeal of the interpreter in your situation?


Suppose one were to implement a compiler. Not only do you have to generate CPU-specific instructions, but also OS-specific executable and object code formats. An interpreter on the other hand could be implemented in ISO C and run on basically any machine right out of the box. Provide an interface to the interpreter via a simple #include and now the user can do all of the heavy-lifting/tight-loops/etc from the C side, while embedding most of the complex program logic within their main script.

Another problem with pre-compiled code is that it is almost inherently unsafe. You basically have to blindly trust that the developer is neither malicious nor fallible (as a bad design choice could possibly introduce a vulnerability). Interpreted code could be sandboxed and/or audited, and the user could control access to certain resources if they so choose.

But yes, interpreters are rarely as efficient as their compiled counterparts. I don't think that is as big of an issue today as it use to be though. Python is currently one of the most popular languages for scientific computing, and yet it often runs about 100X slower than the C equivalent.

I was actually doing some testing an a possible candidate for the opcode format (not generated by a parser in this case, but "hand-crafted") and wrote a program to loop one billion times, decrement a counter, then call an empty function. It took roughly 15 seconds as opposed to the C version, which did the same in just 1.5 seconds. So that was encouraging (although I would love to bring that down even further). The size of the code was really pleasing - less than a dozen bytes to encode the entire program, do-nothing function and all!
 

Thread Starter

ApacheKid

Joined Jan 12, 2015
1,040
I'd like to ask
What do you mean by the thread title ?
I think I miss understood you.
The title is informal, merely an attempt to convey the question - what could/should a programming language "look like" if it were being designed from scratch where the primary area of application is writing software for MCUs both "low end" and "high end".

This covers questions about grammar and semantics as well as specifics about data types and arithmetic as well as interop with other languages, preprocessor capabilities and any number of things deemed relevant by those who write for such devices.

Some interesting things have come up recently in my research, for example fixed point arithmetic is mentioned a lot in posts and blogs and being very helpful in systems with no floating point hardware or systems that work with DSPs.
 
Last edited:
Top