C program compilation process

Thread Starter

Kittu20

Joined Oct 12, 2022
463
I haven't written a lot of code but have written some source codes and I am still confused as how the compiler translates the source code into executable file

Currently I am reading the following web page to understand the compilation process
https://www.javatpoint.com/compilation-process-in-c

I use an editor to create or edit C source programs. and I save program file with c extension (for example: main.c, temp.c, etc.)

Preprocessor stage

I think first a preprocessor will process the files and replace all the #include and #define (and other preprocessor commands) with their value.

I think preprocessor process doesn't occupy memory it only substitutes the value for the name.
 

dl324

Joined Mar 30, 2015
16,846
What you're referring to compilation is actually several phases: preprocessing, compiling to object code, and linking with libraries to create an executable. There are other commands for creating and maintaining libraries.
 

WBahn

Joined Mar 31, 2012
29,979
If you really want to get a good understanding of the core (certainly not everything) of how programs are created and run, check out the Nand2Tetris project. There you will start with nothing but 2-input Nand gates and DFF chips and will build an entire computer (sans a few things) including the CPU, memory, and the control logic. This will all be in emulation. You will then write an assembler, write a stack-based virtual machine translator, write a compiler for a high-level object-oriented programming language, and write the operating system libraries. While is sounds gargantuan, it is actually extremely filtered down to something that is very tractable -- it's normally taught as a one-semester college course or a one-year high school course.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
463
What you're referring to compilation is actually several phases: preprocessing, compiling to object code, and linking with libraries to create an executable. There are other commands for creating and maintaining libraries.
I want to confirm that the compiler does not allocate memory in the pre-processor stage, it just replaces the text with the value. Is this my misconception or am i right?
 

dl324

Joined Mar 30, 2015
16,846
I want to confirm that the compiler does not allocate memory in the pre-processor stage, it just replaces the text with the value. Is this my misconception or am i right?
That's correct because the preprocessor is a separate program and happens before compilation which allocates memory for variables.
 

BobTPH

Joined Jun 5, 2013
8,813
That page should not be taken literally. Compliers generally do not ho through those phase producing intermediate files at each stage.

Compilers I have worked on can doing lexical analysis, preprocessing and syntactic analysis all concurrently. The result of that is an intermediate representation in memory.

That representation is then passed through the optimizers and the code generator which creates an intermediate representation of the instructions and finally the object file.

The object files are combined into the final executable program by a linker.

But all of this is just the usual way of doing it, it has been done many other ways.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
463
If you really want to get a good understanding of the core (certainly not everything) of how programs are created and run, check out the Nand2Tetris project.
i really appreciate your advice but right now i am mainly focussing on programming. Along with that, I want to have a good fundamental knowledge of UART, SPI, I2C CAN protocols. When I feel that I have learned enough C language, I would like to learn , one object oriented language like C++
 

WBahn

Joined Mar 31, 2012
29,979
I want to confirm that the compiler does not allocate memory in the pre-processor stage, it just replaces the text with the value. Is this my misconception or am i right?
You need to make clear what YOU mean about allocating memory. If you mean the actual allocation of physical memory in the machine for the variables in a program, that doesn't happen until the program actually runs. If you mean generating the code responsible for causing the allocation of memory when the code runs, that's something else. If you mean the point in the source code where that code will eventually get executed, that's yet something else again.

But as far as what you are saying with regards to the preprocessor, conceptually, that's correct.

In principle you could have the preprocessor process the source code files and produce output files that have all of the preprocessor directives taken care of and then send those files to the compiler for compilation. That could also be broken down into discrete steps, such as lexical analysis, syntax analysis, and code generation. But these are normally implemented in a pipelined manner so that as soon as the preprocessor processes a line of code, it sends that line of code to the lexical analyzer (tokenizer), which sends the token to the syntax analyzer (parser) as it produces them, which then parses the code and sends it to the code generator (or whatever processing block comes next in that particular tool chain).

A common way of doing it is in the reverse order. The code generator produces object code on whatever it currently has from the parser until it runs out, it then tells the parser to give it another chunck to work with. The parser then runs just long enough to do that, but it might have to ask the tokenizer for some more stuff. If so, the tokenizer only runs long enough to produce another token, but it might have to invoke the preprocessor to give it some more processed source code.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
463
But as far as what you are saying with regards to the preprocessor, conceptually, that's correct.
The header file contains the definition of variable, function and prototype of function.

temp.h
C:
extern int x;
void foo();
temp.c
C:
int x;
void foo() {
  x++;
}
main.c
C:
#include <stdio.h>
#include "temp.h "
 
int main() {
  x = 1;
  printf( "%d\n ", x);
  foo();
  printf( "%d\n ", x);
 
  return 0;
}
What happens with the header files temp. h in the preprocessor phase?
 

BobTPH

Joined Jun 5, 2013
8,813
Depends on what compiler. The program behaves as if the #include directive is replaced by the text of the include file. How the complier accomplishes this does not matter.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
463
Okay, After the preprocessor phase, the combiner converts the code into assembly code.

After that the assembly codes are converted into object files.

After that Linker puts those object files together into the final executable.

What is Compilation Unit in all these? how many compilation units does my code have ?

Is the object file itself the compiler unit ?
 

ApacheKid

Joined Jan 12, 2015
1,533
Okay, After the preprocessor phase, the combiner converts the code into assembly code.

After that the assembly codes are converted into object files.

After that Linker puts those object files together into the final executable.

What is Compilation Unit in all these? how many compilation units does my code have ?

Is the object file itself the compiler unit ?
Programming languages have a syntax that is defined by a "grammar" a set of rules written in (another language). The grammar defines a set of hierarchical rules, each rule is defined in terms of other rules and the whole grammar represents a tree of rules.

The root rule of that tree is often called "compilation unit" but not always. In your case "compilation unit" is the highest level rule in the C language's grammar, it's not a name used in C itself, it's just a name that means a syntactic block that is not contained inside another.

There's a rule for every language keyword for example there's an "if statement" rule and a "while loop" rule and a "goto" rule and a "function definition rule" and so on.

As you know ifs and whiles can contain other ifs and while but the cant contain function definitions because the rule don't allow for that.

But function definitions are contained in a compilation unit, every code construct is contained inside some other construct but the compilation unit is not contained in anything, there is nothing that can "contain" a compilation unit.

So its just a name used to represent that idea, the thing that contains everything else but is not itself contained in anything.

Here's an example of a formal (machine readable) grammar for the C 11 language you can see it has a rule named compilation_unit it is with that rule that the compiler begins to parse your source code, if we renamed that rule to "outermost_rule" or anything else, it would work still the C language itself has no concept of "compilation unit" only the grammar has that and the grammar is not written in C.
 
Last edited:

Thread Starter

Kittu20

Joined Oct 12, 2022
463
You need to make clear what YOU mean about allocating memory. If you mean the actual allocation of physical memory in the machine for the variables in a program, that doesn't happen until the program actually runs.
There are the steps like preprocessor, assembler, linker

In which step the memory for the variable is allocated in compilation step?

Can I assume that the compiler does not allocate space for any variable until the final program is executed on the machine?
 

WBahn

Joined Mar 31, 2012
29,979
There are the steps like preprocessor, assembler, linker

In which step the memory for the variable is allocated in compilation step?

Can I assume that the compiler does not allocate space for any variable until the final program is executed on the machine?
It's still hard for me to tell what you are really getting at. The compiler runs once and is done and finished before the final program is ever executed. So there is no way for the compiler to allocate memory when the final program is executed.

What the compiler does is generate code that causes space to be allocated for variables when the final program is executed.
 

WBahn

Joined Mar 31, 2012
29,979
Okay, After the preprocessor phase, the combiner converts the code into assembly code.

After that the assembly codes are converted into object files.

After that Linker puts those object files together into the final executable.

What is Compilation Unit in all these? how many compilation units does my code have ?

Is the object file itself the compiler unit ?
I think you are referring to "translation units". If so, for nearly all practical purposes, each file is a translation unit. To be specific, a source code file, together will all of the other files that are referenced by #include directives, constitute one preprocessing translation unit and, after the preprocessing is done, the result of that is a translation unit.

The idea is that translation units can be compiled separately and in any order and then later linked to produce the final executable program.
 
Top