A few questions around Source and Target language (wrt compilers)

Thread Starter

richard3194

Joined Oct 18, 2011
179
Hi. I started a thread, which is proving very helpful. But, if I post on that thread the things I now want to, I'll be moving rather off-topic, so I've started this new one.

I just want to examine, by posting questions related to source and target languages wrt compiling. I'm trying to build up a sort of vision of understanding. As it stands I see a human at a keyboard typing, in a very structured language (in the English), source code, (fed to the compiler software) which code is a function of the skill & knowledge of the programmer. He/she is writing (say) in C. The software produces an executable file, to be run on a machine that executes instructions, here, I mean the instructions contained in the executable file.

OK, as far as I know, a human (as oppossed to some language construction or translation by machine -, at the keyboard - can also write:

* assembly language
* object code
* machine code
I think machine code is object code, not entirely sure.

I'm assuming that any executable program made by writing in C, using compiler software, can also be made by writing in any of the above three languages (Are they all languages, I ask).

Questions:
Regarding all three languages above:
#1: Is the guy writing (possibly compiler guy) using compiler software?
#2: Is my assumption correct, that any executable file created, written in C, can also be written - through the keyboard - in any of the above three languages?

I am assuming that the three languages in the above list, are creations of chip manufacturers. Like INTEL, etc. Thanks.
 

Ya’akov

Joined Jan 27, 2019
9,236
A compiler compiles a high level language (like C) into a lower level representation which might be anything from:

P Code—”Portable Code”, a platform agnostic representation of the program which can execute in a VM (Virtual Machine) running on many different architectures—to,

Object Code—which is a much lower level representation than P Code, and targets just one architecture but unlike machine code can include symbolic content making references to variables and code not present in the source (e.g.: libraries) which are resolved by linking those libraries after compilation.

An assembler is the functional equivalent of a compiler but acts on Assembly Language. Unlike higher level programming languages, assembler is very much target specific being only one level above machine language—the native language of the processor.

Assembly language (also called “assembler”, but not the same as the assembler that acts on the user generated code) consists of very simple statements nearly the same as the op codes of machine language but designed to be written by humans and so with some abstraction (e.g.: names for the operations, symbolic references, &c.).

Op codes, or “operation codes” are the direct instructions to the processor that make up machine language, the are the lowest level representation of a program and have unlike assembler there is no symbolic content, everything is directly stated with an op code and its operands (values, addresses, &c.) in binary form.

Any program written in C can in principle be written in C or some other high level language, but practically speaking, the prospect of writing complicated programs in assembler is dodgy. Assembler is used for “inner loops”, that is, parts of a program or system that must execute in as few instructions as possible to provide the performance needed for the system to function.

In practice, modern optimizing C compilers can do most of these inner loop tasks, but for some things, more or less assembler is common. Device drivers are an example where you might find only C being used or C plus some assembly language.

While there is nothing inherently preventing a person from writing programs directly in machine language, the lack of abstraction means anything more than a very simple program would become such a complicated task the practical considerations prevent it.
 

ApacheKid

Joined Jan 12, 2015
1,658
Hi. I started a thread, which is proving very helpful. But, if I post on that thread the things I now want to, I'll be moving rather off-topic, so I've started this new one.

I just want to examine, by posting questions related to source and target languages wrt compiling. I'm trying to build up a sort of vision of understanding. As it stands I see a human at a keyboard typing, in a very structured language (in the English), source code, (fed to the compiler software) which code is a function of the skill & knowledge of the programmer. He/she is writing (say) in C. The software produces an executable file, to be run on a machine that executes instructions, here, I mean the instructions contained in the executable file.

OK, as far as I know, a human (as oppossed to some language construction or translation by machine -, at the keyboard - can also write:

* assembly language
* object code
* machine code
I think machine code is object code, not entirely sure.

I'm assuming that any executable program made by writing in C, using compiler software, can also be made by writing in any of the above three languages (Are they all languages, I ask).

Questions:
Regarding all three languages above:
#1: Is the guy writing (possibly compiler guy) using compiler software?
#2: Is my assumption correct, that any executable file created, written in C, can also be written - through the keyboard - in any of the above three languages?

I am assuming that the three languages in the above list, are creations of chip manufacturers. Like INTEL, etc. Thanks.
It's difficult to answer some of these questions because I get the distinct impression your trying to ask deeper questions but aren't quite sure how to ask them.

Here's some randomly chosen C, it is what you see here, copy and pastable text:

Code:
void SetupSysTick()
{

    SCB_AIRCR_Reg aircr;
   
    bus.syst_ptr->CTRL.ENABLE = 0;
   
    // Must be written a single 32 bit write.
   
    aircr = bus.scb_ptr->AIRCR;
    aircr.VECTKEY = SCB_AIRCR_VECTKEY;
    aircr.PRIGROUP = 3;
    bus.scb_ptr->AIRCR = aircr;
    bus.scb_ptr->SHPR.PRI_15 = 15; // NVIC_SetPriority
   
    bus.syst_ptr->LOAD.ALLBITS = 16000;
    bus.syst_ptr->VAL.ALLBITS = 0;
    bus.syst_ptr->CTRL.TICKINT = 1;
    bus.syst_ptr->CTRL.CLKSOURCE = 1;
    bus.syst_ptr->CTRL.ENABLE = 1;
}
That is C, it is just text nothing more. You can open Notepad and write C just like that above, it is just text and there is no compiler involved when you do that, C or Pascal or Java is just text, nothing more.

Assembler is also just text, can be written in Notepad, here's some random Z80 assembler:
Code:
ld hl,$1000                    ; Pointer to the data
  ld ix,$2000                    ; Pointer to the non-negative list
  ld iy,$3000                    ; Pointer to the negative list
  ld b,200                       ; Loop counter
Repeat:
  ld a,(hl)                      ; Getting and checking the sign of the current element
  inc hl
  cp $80
  jr nc,Negative
  ld (ix),a                      ; Storing a non-negative value
  inc ix
  jr Continue
Negative:
  ld (iy),a                      ; Storing a negative value
  inc iy
Continue:
  djnz Repeat
That Z80 assembler text can be fed into a Z80 assembler, if you do you can get this output called a listing:
Code:
0000   21 00 10               LD   hl,$1000   ; Pointer to the data
0003   DD 21 00 20            LD   ix,$2000   ; Pointer to the non-negative list
0007   FD 21 00 30            LD   iy,$3000   ; Pointer to the negative list
000B   06 C8                  LD   b,200   ; Loop counter
000D                REPEAT:  
000D   7E                     LD   a,(hl)   ; Getting and checking the sign of the current element
000E   23                     INC   hl  
000F   FE 80                  CP   $80  
0011   30 07                  JR   nc,Negative  
0013   DD 77 00               LD   (ix),a   ; Storing a non-negative value
0016   DD 23                  INC   ix  
0018   18 05                  JR   Continue  
001A                NEGATIVE:  
001A   FD 77 00               LD   (iy),a   ; Storing a negative value
001D   FD 23                  INC   iy  
001F                CONTINUE:  
001F   10 EC                  DJNZ   Repeat
That too is a text file, it is the original text but has more details like the machine instructions in raw numeric (hex form) this output contains machine code, object code but is not itself machine, code, this however:

Code:
210010DD210020FD21003006C87E23FEE610001000803007DD7700DD231805FD7700FD231014
That latter text is how the object code looks when you display on a screen, to display such files you need a way to display binary/hex files as text, this is a tool that can do that but there are many others out there too.

If you look at that object code you'll see the same kinds of numbers that are in the listing, like DD 21 00 20 (the spaces are just for ease of reading, not part of the object code).

The Z80 assembler program cannot be used by any other processor like ARM or x86 or 6502 or 68000, it is specific to the Z80. The C code though is different, it is not specific to a Z80 or ARM or anything, with a C compiler we can turn that C into Z80 object code or 68000 object code or ARM object code, we just need to use the right kind of compiler or tell the compiler that we want it to make code for Z80 or ARM.

Assembler is a text language JUST for some CPU but C code is a text language for ANY CPU.
 

MrChips

Joined Oct 2, 2009
30,946
We gather that you are just expanding your knowledge and experience in the sphere of computer programming and that is perfectly ok.

Machine code is a collection of computer instructions that are represented as zeros and ones. This is the program that is downloaded and executed by the computer processor. (Note that some people think that machine code is written in hexadecimal notation. This is categorically false. Hexadecimal notation was invented because humans get tired of writing out long sequences of zeros and ones and which also is prone to transcription errors.) Machine code can be transported from one machine to another using hexadecimal notation but this is simply a representation of binary code, zeros and ones.

Assembly code is machine code in a human readable form. Again, humans are lazy. Who is going learn and remember that the machine instruction 10101111 is used to stop processor execution? Instead, we make up English language words such as STOP to convey what we want the processor to do.

The assembler is a computer program that converts a text file that contains our English language words into machine code. The assembler can be written in any programming language. Since the assembler (and all code translation programs) have to manipulate a lot of word and sentence phrases, it is best to use a programming language that has a rich set of instructions and/or library functions optimized for manipulating characters and character arrays (strings of text).

Yes, the creator of the assembler has to have an intimate knowledge of the target processor as well as the design and development of the assembler in the programming language platform.

The same methodology can be extended to all HLL (high level language) compilers, interpreters, p-code processors, virtual machines, etc. The process is the same. A programmer has to select a programming language and create a program which converts text A to text B. Another program is created to convert text B to text C, and so on. The final output does not have to be machine code. Text C can be in a form which a virtual machine can recognize and execute.

Object code can be whatever the user defines as object code. Object code can be text A, B, or C, etc.
Source code is commonly accepted to mean text A, the first stage of program development.
 
Top