Can plain English language be adequate as a program language?

ApacheKid

Joined Jan 12, 2015
1,762
Is it the case that the compiler guy needs to know both the source code & the target code, whilst the source code guy (writing apps, such as browsers etc) only needs to learn the source code (like C).
Not always, today many compilers contain little information about targets, instead a generic and imaginary target can be used (called an 'intermediate representation').
 

WBahn

Joined Mar 31, 2012
32,895
You raise some interesting points. But I must take issue with "Once the code interfaces with the real world" because software never interfaces with the real world. In an MCU surely the software interfaces with very formally defined peripheral registers, no ambiguity there as I'm sure you'll agree.
Plenty of potential ambiguity in hardware. Glitches, critical races, metastability, undefined state values, setup/hold time violations, register upset, power supply rejection, EMI, -- and that's just to name a few.
 

WBahn

Joined Mar 31, 2012
32,895
Not always, today many compilers contain little information about targets, instead a generic and imaginary target can be used (called an 'intermediate representation').
The only difference is that the "target" is not the native hardware, but rather the virtual machine. But the issue is still the same -- the compiler guy has to know both the source code word and the target code world, even if the target is an intermediate-level language.
 

MrChips

Joined Oct 2, 2009
34,842
Is it the case that the compiler guy, (creating compiler code) needs to know both the source language & the target language, whilst the source code guy (writing apps, such as browsers etc) only needs to learn the source language (like C, etc). Of course, the compiler guy would also be a source code guy. I think.
Imagine you built the first machine and there was no code compiler for it. Where do you start? Who writes the first compiler for what machine that isn't even invented yet? What comes first, the chicken or the egg? We have come a long way from that in that we can use current languages and compilers to create the next language and compiler. Back in the 1970's new MCUs were popping up like crazy like mushrooms in the garden, faster than the software compilers. In order to keep up, I wrote my own assemblers for the likes of MOSTEK 6502, and Motorola 6800, 6802, 6805, 6809, 68HC11, Atmel AVR, etc. in order to keep apace with every new development.

It used to be that the machine was created first then the assembler/compiler followed. That is the bottom-up approach as used by Microchip PIC development. This changed with the Atmel AVR instruction set. They optimized the instruction set first. Then they built the machine to execute that instruction set.

Similarly, you can design the HLL first to match an optimized p-code instruction set. Then you design the machine optimized for a given virtual machine. That is the top-down engineering approach.
 

ApacheKid

Joined Jan 12, 2015
1,762
The only difference is that the "target" is not the native hardware, but rather the virtual machine. But the issue is still the same -- the compiler guy has to know both the source code word and the target code world, even if the target is an intermediate-level language.
Everything you said is true, but I wanted to show how "target" is fluid, not necessarily some actual real CPU. A compiler writer in the old days really had to know in intimate detail, everything about the target machine's registers, addressing modes, and various idiosyncrasies. This is now uncommon, LLVM (by way of example) insulates the compiler developer from the specifics of say ARM or X64 or whatever.

LLVM though is quite complex, tough to use if not willing to work in C++, and notoriously idiosyncratic, it strikes me that this could become a service, just send a request to a web service and get the CPU specific translation back...
 
Last edited:

WBahn

Joined Mar 31, 2012
32,895
Everything you said is true, but I wanted to show how "target" is fluid, not necessarily some actual real CPU. A compiler writer in the old days really had to know in intimate detail, everything about the target machine's registers, addressing modes, and various idiosyncrasies. This is now uncommon, LLVM (by way of example) insulates the compiler developer from the specifics of say ARM or X64 or whatever.

LLVM though is quite complex, tough to use if not willing to work in C++, and notoriously idiosyncratic, it strikes me that this could become a service, just send a request to a web service and get the CPU specific translation back...
I still don't see the major distinction. Before, the compiler writer had to know the fine details of the target's instruction set and behaviors, which could be quite subtle and notoriously idiosyncratic. How is that any different if the target happens to be LLVM?

The use of virtual machines is not new. Many (certainly not all) compilers were written so as to leverage this concept. An early step was compiling the source code to some kind of stack-oriented virtual machine language and a later step was translating the virtual machine commands into processor-specific ISA instruction sequences. If you had a compiler for some language targeting some processor and wanted to target a different processor, you didn't rewrite the whole compiler, you only rewrite the translator. Similarly, the person/team writing the front part of the compiler might know very little about the eventual target processor -- they only needed to know the syntax and semantics of the virtual machine language.

What I've always found interesting (well, 'always' once I learned about it, anyway) is how many widely used and referenced works regarding C talk about the heap and the stack as though they are part of the language standard (some are even quite explicit about stating that they are), when there is nothing at all in the language standard that even hints that either of these concepts are expected, let alone required. In fact, neither term appears anywhere in the standard.
 

Ya’akov

Joined Jan 27, 2019
10,253
There was an endeavor in the 70s and later to develop so-called 4th and later 5th generation languages, this was a bit of a fad with lots of hype, but anyone older than 50 might well recall this.
Strangely enough, 4GLs are still with us but now they are called LCDPs (Low Code Development Platforms). Usually relying on a graphically oriented interface (see: Node-RED for an accessible example).

Rational ROSE was a product that used UML (Unified Modeling Language) to generate programs in Java. In theory this is great—the people who most clearly understand the business process document it using a graphical description in a language that has impressive descriptive power and computer programs interpret the description and produce code.

In practice, as you can imagine, the code produced was… not optimal. It worked… for very small values of work. Real systems were built using it. In the end it might not have been any worse than the huge teams of Java programmers given isolated functions to write that were being used at the time.

As mentioned elsewhere, ambiguity is the problem with any “conversational language” programming. Conversations use multiple communications channels—including expressions and gestures, and a great deal of compression (in the form of context and convention), as well as error correction based on asking clarifying questions.

Computer languages deal with these things by strictly limiting implied communication to declared items such as variables and libraries. References to these things are exclusively by precise naming and parameterization. Conversation is often preceded by a kind of handshaking that establishes a context if that is required.

This won’t work for programming since variables, states, states, and conditions that programs depend on to avoid race conditions, infinite loops, and other failure modes can’t be allowed to remain ambiguous and the program can’t use and “judgement” to clear them up.

Natural Language Programming (NLP) is something that is actually used today—but the input is constrained, and the power to create conditional arrangements is limited. An example of something akin to but not quite NLP is the rationalized “plain language“ used by AppleScript.

In AppleScript a combination of paradigms is mashed together to make programming “easier” for a non-programmer. This includes ordinary procedural programming, objected-oriented programming, and NLP. The NLP is used to tie the other parts together which allows the user to create the context I mentioned above in a more formalized syntax then manipulate that context in a more familiar one.

But take note: it is not arbitrary, and the user must understand how to say what she wants in order to use it. It is still “easier” than learning a more traditional language because the concepts are encoded in very familiar language rather than an arbitrary scheme intended to improve on that familiar one. For example:
AppleScript Code Snipper:
display alert "Hello, world!" buttons {"Rudely decline", "Happily accept"}
set theAnswer to button returned of the result
if theAnswer is "Happily accept" then
    beep 5
else
    say "Piffle!"
end if
Having said all this, at least one person has pursued the idea of ”plain English programming” to an impressive extent, and information can be found here.
 

ApacheKid

Joined Jan 12, 2015
1,762
The use of virtual machines is not new. Many (certainly not all) compilers were written so as to leverage this concept. An early step was compiling the source code to some kind of stack-oriented virtual machine language and a later step was translating the virtual machine commands into processor-specific ISA instruction sequences. If you had a compiler for some language targeting some processor and wanted to target a different processor, you didn't rewrite the whole compiler, you only rewrite the translator. Similarly, the person/team writing the front part of the compiler might know very little about the eventual target processor -- they only needed to know the syntax and semantics of the virtual machine language.
Yes this is what I understand too. When I worked on a PL/I compiler I gathered as much material as I could (this is years before the internet) and this paper was a huge help. Freiburghouse was one of the first compiler engineers to use an abstract processor and also later a table driven system for the back end.

He supplied compilers for many companies including DEC, Data General, Honeywell, Wang and others, he talks about the compiler world in this oral history (he later became a co-founder of Stratus Computers):

But that's all in the intermediate language. Then, you're still looking at intermediate language, which is independent of the target computer. Optimization phase went over that and found common sub expressions and eliminated them. I have a paper, which I published, which talks about how to do that. It also talks about how to allocate registers in the code generation stage, in a very efficient manner. So that paper shows what the intermediate language looks like. But it's really a paper that's talking about allocating scarce resources, namely machine registers. Based on the usage information that you could collect from the optimization stage. Anyway, that paper has been referenced a gazillion times since I wrote it. I wrote it in 1974.
Here's his paper on Register Allocation via Usage Counts. The Stratus programming languages all shared a common backend (C, PL/I, Pascal) and common optimizer, Freiburghouse really did have deep insights into this and it seems was an important contributor to the subject.

What I've always found interesting (well, 'always' once I learned about it, anyway) is how many widely used and referenced works regarding C talk about the heap and the stack as though they are part of the language standard (some are even quite explicit about stating that they are), when there is nothing at all in the language standard that even hints that either of these concepts are expected, let alone required. In fact, neither term appears anywhere in the standard.
The C standard (like many programming language standards) is really (or should ideally be) a metalanguage document and the metalanguage is the place to describe how a C program will execute on some abstract machine. I don't know how formally this is done for C.
 
Top