Compiler Memory Allocation

Thread Starter

Kittu20

Joined Oct 12, 2022
431
Hello Everyone,

I'm using a Windows 11 operating system and have installed the GCC compiler on my laptop. I've written a C program that successfully compiles and runs. However, I'm trying to gain a better understanding of how variables are stored in the memory of my laptop and the compiler's role in this process. Terms like 'compile time,' 'run time,' 'static memory allocation,' and 'dynamic memory allocation' are confusing to me.

I've attached my C code along with the generated assembly output.

C:
#include <stdio.h>
#include <stdlib.h>

/* Global initialized variable in the Data Segment */
int global_initialized_var = 42;

/* Global uninitialized variable in the Data Segment (BSS) */
int global_uninitialized_var;

/* Static variable in the Data Segment */
static int static_var = 30;

void stack_example() {
    /* Local variable in the Stack */
    int local_var = 10;
    printf("Local Variable in Stack: %d\n", local_var);
}

int main() {
    /* Text Segment: CPU-executable code */
    printf("Hello from the Text Segment!\n");

    /* Accessing Initialized Data in Data Segment */
    printf("Initialized Global Variable: %d\n", global_initialized_var);

    /* Accessing Uninitialized Data in Data Segment (BSS) */
    printf("Uninitialized Global Variable: %d\n", global_uninitialized_var);

    /* Accessing Static Data in Data Segment */
    printf("Static Global Variable: %d\n", static_var);

    /* Calling a function to demonstrate Stack usage */
    stack_example();

    /* Dynamic memory allocation in the Heap */
    int *dynamic_var = (int *)malloc(sizeof(int));
    *dynamic_var = 20;
    printf("Dynamic Variable in Heap: %d\n", *dynamic_var);

    free(dynamic_var);

    return 0;
}
Program Output
Code:
Hello from the Text Segment!
Initialized Global Variable: 42
Uninitialized Global Variable: 0
Static Global Variable: 30
Local Variable in Stack: 10
Dynamic Variable in Heap: 20
Assembly output
C:
    .file    "hello.c"
    .globl    _global_initialized_var
    .data
    .align 4
_global_initialized_var:
    .long    42
    .comm    _global_uninitialized_var, 4, 2
    .align 4
_static_var:
    .long    30
    .section .rdata,"dr"
LC0:
    .ascii "Local Variable in Stack: %d\12\0"
    .text
    .globl    _stack_example
    .def    _stack_example;    .scl    2;    .type    32;    .endef
_stack_example:
LFB14:
    .cfi_startproc
    pushl    %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    subl    $40, %esp
    movl    $10, -12(%ebp)
    movl    -12(%ebp), %eax
    movl    %eax, 4(%esp)
    movl    $LC0, (%esp)
    call    _printf
    nop
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE14:
    .def    ___main;    .scl    2;    .type    32;    .endef
    .section .rdata,"dr"
LC1:
    .ascii "Hello from the Text Segment!\0"
    .align 4
LC2:
    .ascii "Initialized Global Variable: %d\12\0"
    .align 4
LC3:
    .ascii "Uninitialized Global Variable: %d\12\0"
LC4:
    .ascii "Static Global Variable: %d\12\0"
LC5:
    .ascii "Dynamic Variable in Heap: %d\12\0"
    .text
    .globl    _main
    .def    _main;    .scl    2;    .type    32;    .endef
_main:
LFB15:
    .cfi_startproc
    pushl    %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    call    ___main
    movl    $LC1, (%esp)
    call    _puts
    movl    _global_initialized_var, %eax
    movl    %eax, 4(%esp)
    movl    $LC2, (%esp)
    call    _printf
    movl    _global_uninitialized_var, %eax
    movl    %eax, 4(%esp)
    movl    $LC3, (%esp)
    call    _printf
    movl    _static_var, %eax
    movl    %eax, 4(%esp)
    movl    $LC4, (%esp)
    call    _printf
    call    _stack_example
    movl    $4, (%esp)
    call    _malloc
    movl    %eax, 28(%esp)
    movl    28(%esp), %eax
    movl    $20, (%eax)
    movl    28(%esp), %eax
    movl    (%eax), %eax
    movl    %eax, 4(%esp)
    movl    $LC5, (%esp)
    call    _printf
    movl    28(%esp), %eax
    movl    %eax, (%esp)
    call    _free
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE15:
    .ident    "GCC: (MinGW.org GCC-6.3.0-1) 6.3.0"
    .def    _printf;    .scl    2;    .type    32;    .endef
    .def    _puts;    .scl    2;    .type    32;    .endef
    .def    _malloc;    .scl    2;    .type    32;    .endef
    .def    _free;    .scl    2;    .type    32;    .endef
Memory layout categories:

Text Segment:
Read-only.
Contains CPU-executable machine code instructions.

Data Segment:
Initialized Data: Holds initialized global/static variables.
Uninitialized Data: Stores uninitialized global/static variables.

Stack:
Stores local variables

Heap:
Utilized for dynamic memory allocation.

My understanding is that laptop processor reads binary instructions from program memory to perform tasks. Program memory stores the instructions that make up a program, and these instructions guide the processor in executing specific tasks.

https://www.geeksforgeeks.org/difference-between-static-and-dynamic-memory-allocation-in-c/ link mention that static memory is allocated at compile time

So, when we mention memory allocation at compile-time, run time what exactly does that mean? My confusion arises from the fact that during compilation, the program isn't actually executed on hardware. So, I'm trying to undrstand how memory allocation can occur at compile time when there's no physical execution of the program?
 
Last edited:

MrChips

Joined Oct 2, 2009
30,494
You are correct. At compile time there is no physical allocation of memory. The compiler only knows that certain pieces of code and data should be allocated to certain areas of memory.

In a large system with a controlling operating system, memory will be allocated at run time.

With smaller embedded systems using small microcontrollers, the programmer can define exact physical locations in his/her program code. The compiler can respect these requests and memory allocation will be made by the linker.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
431
The compiler only knows that certain pieces of code and data should be allocated to certain areas of memory.
I appreciate your explanation. However, I would like to go deeper into the concept of 'certain pieces of code' in relation to memory allocation. In my C code, I have global variables. Could you explain what exactly the compiler does with these global variables?

From what I understand, the assembler converts the source code into machine code, but does the compiler generate machine instructions that instruct the processor to allocate memory locations for global variables when the program is running on the actual hardware
 

BobTPH

Joined Jun 5, 2013
8,665
You are conflating multiple definitions of “memory allocation”.

This is a highly simplified explanation, so those who know more, please don’t come back and say “what about DLLs and position independent code.” I am ignoring these for now. And this is specifically how it is done on Windows. Other systems will be similar but different in the details.

The compiler “allocates” static and global memory by creating memory sections in the object file. Variables declared statically are placed in a section of memory in the object file. The complier refers to these in the object code by a section number and offset. Each source file listed on the command line creates its own object file. Program code is also assigned to sections, either one per source file or one per function.

The linker then combines the multiple object files into an executable file. It combines all the sections from the object files into larger sections that contain code or data with similar attributes (i.e. read, write, and execute.)
The sections in an executable are assigned to virtual addresses starting at a specific virtual memory address at link time. The code in the executable can refer to these addresses directly, the virtual address remains the same no matter where in physical memory these are actually placed.

When a program is executed (in windows), actual physical memory is allocated by page, only as needed. This is done by hardware, which transparently translates the virtual addresses used by the code into physical addresses used by the hardware. And even that is not the end of it. The actual DRAM memory is far too slow, so it is mirrored in cache memory, which is faster. I think modern CPUs have 3 levels of cache.

The more you know about all this, the more it seems like a miracle that it works.
 

MrChips

Joined Oct 2, 2009
30,494
In a Windows operating system, the programmer and the executable object code has no control over absolute memory allocation. All memory is allocated from the application heap. All addresses are relative to the starting location of the memory space allocated. Local and global variable allocation have a different meaning.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
431
Below is a simplified representation of the memory allocation ranges for different segments of a C program:

1696078547073.png

Please note that these ranges are simplified and may not reflect the actual memory layout on specific systems, as memory allocation depends on the architecture and operating system. The ranges are presented in hexadecimal format for clarity, and actual memory addresses can vary widely.

I want to know what the compiler does during compile time, link time, and run time. How does it affect my program at each of these stages?
 

BobTPH

Joined Jun 5, 2013
8,665
Below is a simplified representation of the memory allocation ranges for different segments of a C program:

View attachment 303817

Please note that these ranges are simplified and may not reflect the actual memory layout on specific systems, as memory allocation depends on the architecture and operating system. The ranges are presented in hexadecimal format for clarity, and actual memory addresses can vary widely.

I want to know what the compiler does during compile time, link time, and run time. How does it affect my program at each of these stages?
I told you that in my post.

The details of this are way complicated, and you do not have to be concerned with it, you are wasting your time. I only know it because I wrote compilers a living for over 50 years.

Your example sheds no light on anything, it is artificial and lacks most of what is actually happening.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
431
I only know it because I wrote compilers a living for over 50 years.
I'm genuinely interested in your thoughts on the subject of memory allocation for variables and the compiler's role in process.

I know it's difficult topic but I'd appreciate hearing your thoughts and understanding on this topic When terms like 'compile time,' 'link time,' and 'run time' come up in discussions about memory allocation, and role of compiler ( in both case os like code on PC and none os like bare metal code )
 

BobTPH

Joined Jun 5, 2013
8,665
I have no interest in teaching a graduate course in object file formats, linking, and program loading. on Windows or other system, which is what you are asking for.

Can you tell me what you hope to get out of it? Unless you are writing a complier, assembler, linker, or OS, most if it is of no practical use.

Can you tell me what you would expect to learn that would help you in any way? Then maybe I could address the specifics related to that.
 

nsaspook

Joined Aug 27, 2009
12,806
I have no interest in teaching a graduate course in object file formats, linking, and program loading. on Windows or other system, which is what you are asking for.

Can you tell me what you hope to get out of it? Unless you are writing a complier, assembler, linker, or OS, most if it is of no practical use.

Can you tell me what you would expect to learn that would help you in any way? Then maybe I could address the specifics related to that.
In the old days we had these things called books we needed to read and understand to get a passing grade in a thing called school.
1696108103488.png
There is no spoon.
 

WBahn

Joined Mar 31, 2012
29,867
I want to know what the compiler does during compile time, link time, and run time. How does it affect my program at each of these stages?
That is a very big and very ill-defined request. As with nearly any complicated thing, the process can be described at various levels of detail and abstraction and any given choice is going to be too shallow for some, to deep for others, and just right for a lucky few.

I think (and it's all a guess) that, at the level you are (or perhaps should be) trying to understand things is that "allocation" doesn't strictly refer to when a given variable is actually assigned to a specific location in physical memory (something that has no meaning until the program is actually running), but rather when all of the information needed is known and applied in order to fix what that mapping will be once the program is executed.

Usually (never say 'always' or 'never' when talking about different compilers for different languages targeting different architectures), when a static variable is compiled the compiler has enough information to embed its address (or a suitable proxy) in the code directly. For a small microcontroller, the compiler might assign the variable fred to address 1234 and then whenever the code refers to fred, the generated code uses 1234. More generally, the compiler assigned fred to a memory location that is, say, 1234 bytes beyond the start of some reference location, such as the start of the code block. This allows the loader to put the entire program at an arbitrary location in memory (within whatever constraints it operates under) and the code that is generated is hard-coded to be able to find the actual location that fred is located at. A common way of doing this is to calculate the address as an offset to the current value of the instruction pointer.

So has the compiler "allocated" the memory for fred when it does this? At the level of abstraction that is relevant for understanding how a compiler works to the degree that it has any bearing on the vast majority of programmers, yes, this is a reasonable interpretation of what it means to allocate memory.

Dynamically allocated memory is going to entail information that the compiler simply doesn't know at compile time. Instead, that is going to have to be handled at run time.
 

MrChips

Joined Oct 2, 2009
30,494
The other thing to note is that a long time ago Intel introduced the mechanism of segment registers in order to work around the limitations of 16-bit addresses. This meant that code could only access 64K bytes of code without having to change the segment register.

A flat memory model with 32-bit or 64-bit addressing does have this problem. The compiler has to be able to work with these different memory models.
 

Thread Starter

Kittu20

Joined Oct 12, 2022
431
Can you tell me what you would expect to learn that would help you in any way? Then maybe I could address the specifics related to that.
I was trying to understand how the linker identifies variable addresses when the compiler compiles two separate source files. During my research on the internet, reading book I came across various terms like compile time, link time, run time, static memory allocation, and dynamic memory allocation. I was attempting to relate all these terms in the context of memory allocation for variables, but I found it challenging to consolidate them into a single context memory allocation and role of compiler to allocation memory on physical hardware.

However, I think the term 'memory allocation' might not be the most suitable when discussing compile time; perhaps it's better to say that the compiler assigns memory locations at compile time
 
Last edited:

MrChips

Joined Oct 2, 2009
30,494
You are trying to understand a complex topic using a complex model, i.e. Windows OS.

First try to understand memory location on a simple MCU, for example, one that has 256 bytes of RAM and 1024 bytes of flash memory. That is a good place to start.
 

Papabravo

Joined Feb 24, 2006
21,008
I was trying to understand how the linker identifies variable addresses when the compiler compiles two separate source files. During my research on the internet, I came across various terms like compile time, link time, run time, static memory allocation, and dynamic memory allocation. I was attempting to relate all these terms in the context of memory allocation for variables, but I found it challenging to consolidate them into a single context memory allocation and role of compiler to allocation memory on physical hardware
A compiler uses "scope rules" to assign each identifier to one of a small set of storage classes. The compiler's "symbol table" has information about where a variable is defined and where it is used. This information is saved for the linker so that it can combine variables from the same class into sections of memory. This process should be conceptually easy to visualize although the details may currently be beyond your grasp.

Here is a hint for your further study. Ask your compiler for detailed compile time listings of the generated code and symbol table. Then ask the linker for a listing of the detailed link map. Finally open the up the final executable in a debugger to see if you can confirm that you understand how the compiler and the linker produced the result that you can see with your own lyin' eyes.

ETA: This recommendation goes for any class of processor or operating system including all Harvard and von Neumann architectures.
 
Last edited:

Thread Starter

Kittu20

Joined Oct 12, 2022
431
First try to understand memory location on a simple MCU, for example, one that has 256 bytes of RAM and 1024 bytes of flash memory. That is a good place to start.
I have found the AT89S52, which nearly matches the specifications you mentioned - it has 256 bytes of RAM and 8k bytes of flash memory. This seems like a good place to start for understanding memory location on a simple MCU.

The AT89S52 provides the following standard features: 8K bytes of Flash, 256 bytes of RAM
 

MrChips

Joined Oct 2, 2009
30,494
I have found the AT89S52, which nearly matches the specifications you mentioned - it has 256 bytes of RAM and 8k bytes of flash memory. This seems like a good place to start for understanding memory location on a simple MCU.
Good. Now ask your questions and let’s see if we can provide you with some clear concrete answers.
 

BobTPH

Joined Jun 5, 2013
8,665
I was trying to understand how the linker identifies variable addresses when the compiler compiles two separate source files
Simple explanation:

Each source file complies into an object file. The compiler puts the name of the variable in the object fike. The linker places variables with the same name at the same location.

What actually happens. For gcc:

The following variables are all treated differently:

int a;
int b + 3;
static int c;
static int d= 5;
extern int e;
extern int f = 2;
 

Thread Starter

Kittu20

Joined Oct 12, 2022
431
Good. Now ask your questions and let’s see if we can provide you with some clear concrete answers.
@MrChips

So AT89S52 microcontroller executes programs from its flash memory by fetching, decoding, and executing instructions stored in flash memory. It uses a program counter to keep track of the next instruction to execute, allowing it to run the program sequentially.

The opcode is the part of an instruction that defines the operation to be performed (e.g., 'MOV' for move), while the operand specifies the data or location involved (e.g., '#5' for a constant). Machine code is the binary representation of these instructions stored in the microcontroller's flash memory, which the CPU directly executes.

SFR (Special Function Register) addresses, as detailed in Datasheet Table 1 on Page 5, are memory-mapped locations that provide access to specific functions and configurations. Microcontroller programmers utilize these SFR addresses to configure I/O pins, set up timer/counters, control interrupts, and interact with other hardware modules.

I can find SFR address information in the header file.
https://www.keil.com/dd/docs/c51/atmel/regx51.h

I think the compiler assigns these addresses at compile time, so when it generates the executable file , they should be hard-coded. they remain the same when the program is running on the microcontroller.
 

Attachments

Last edited:

Thread Starter

Kittu20

Joined Oct 12, 2022
431
I think, In AT89S52, when we have global or static variables in the program, they should have fixed memory locations, and the linker will assign locations for them at link time.
 
Top