compilation and linking process

Thread Starter

King2

Joined Jul 17, 2022
163
I think I don't understand very well the concept of how compiler and linker work. This could be the reason why I am stuck in this code


C:
#include<stdio.h>

int X;

void foo(){
  int X = 42;
  printf("X = %d \n", X);
  {
    extern int X;
    ++X;
    printf("X  = %d \n", X); // Why does this line print 101, I was expecting X = 43
  }
  printf("X = %d \n", X);
}

int main(){
  X = 100;
   printf("X = %d \n", X);
  foo();
   printf("X = %d \n", X);
  return 0;
}
Output generated by program

C:
X = 100
X = 42
X  = 101
X = 42
X = 101
Why does line 11 print 101, I was expecting X = 43 ?
 
Last edited:

Papabravo

Joined Feb 24, 2006
21,159
This is because X is both a global variable defined outside of any function AND it is also a local variable to function foo(). Doing things this way can be confusing and you should avoid doing this until your knowledge level increases. I can't see the rest of your main() function to help you further. Just because you have the extern statement inside foo() does not mean that the locacal variable X does not take precedence.
 
Last edited:

Thread Starter

King2

Joined Jul 17, 2022
163
This is because X is both a global variable defined outside of any function AND it is also a local variable to function foo(). Doing things this way can be confusing and you should avoid doint this until your knowledge level increases.
Variable X is declared in three different ways I am trying to understand how compiler identifies each of them.

I can't see the rest of your main() function to help you further. Just because you have the extern statement inside foo() does not mean that the locacal variable X does not take precedence.
Main function is between line 16 to line 22. foo is called in the main function
 

Papabravo

Joined Feb 24, 2006
21,159
Possibly, but the extern int X; declaration inside function foo is maningless. Normally it would be used outside of any function to delcare a global variable defined in a separately comnpiled module. You may think your post and your description is complete but that is not the case. Correct your original post. Inside function foo() the local variable X will take precedence.
 

WBahn

Joined Mar 31, 2012
29,978
I think I don't understand very well the concept of how compiler and linker work. This could be the reason why I am stuck in this code

Why does line 11 print 101, I was expecting X = 43 ?
You have a variable, X, that has file scope and therefore has external linkage. Every variable with external linkage (within a program) refers to the same variable and the definition of that variable can only appear once.

Thus, any extern modifier you use will refer to your externally-linked instance of that identifier.
 

WBahn

Joined Mar 31, 2012
29,978
Possibly, but the extern int X; declaration inside function foo is maningless. Normally it would be used outside of any function to delcare a global variable defined in a separately comnpiled module. You may think your post and your description is complete but that is not the case. Correct your original post. Inside function foo() the local variable X will take precedence.
It's not meaningless. Whether it is useful is a different matter, but it is legal and it DOES have an effect.

Consider the following program in which the three foo_?() function differ only in how X within the inner block is declared:

C:
#include<stdio.h>

int X;

void foo_none()
{
  int X = 42;
  printf("-- foo() with no declaration inside inner block--------------\n");
  printf("X@[%p] = %d within foo()\n", &X, X);
  {
    ++X;
    printf("X@[%p] = %d within foo()'s block\n", &X, X); // Why does this line print 101, I was expecting X = 43
  }
  printf("X@[%p] = %d within foo()\n", &X, X);
}

void foo_auto(){
  int X = 142;
  printf("-- foo() with normal declaration inside inner block----------\n");
  printf("X@[%p] = %d within foo()\n", &X, X);
  {
    int X;
    ++X;
    printf("X@[%p] = %d within foo()'s block\n", &X, X); // Why does this line print 101, I was expecting X = 43
  }
  printf("X@[%p] = %d within foo()\n", &X, X);
}

void foo_extern(){
  int X = 242;
  printf("-- foo() with external declaration inside inner bloack ------\n");
  printf("X@[%p] = %d within foo()\n", &X, X);
  {
    extern int X;
    ++X;
    printf("X@[%p] = %d within foo()'s block\n", &X, X); // Why does this line print 101, I was expecting X = 43
  }
  printf("X@[%p] = %d within foo()\n", &X, X);
}

int main(){
  X = 100;
  printf("-------------------------------------------------------------\n");
  printf("X@[%p] = %d within main()\n", &X, X);
  printf("-------------------------------------------------------------\n");
  foo_none();
  printf("-------------------------------------------------------------\n");
  printf("X@[%p] = %d within main()\n", &X, X);
  printf("-------------------------------------------------------------\n");
  foo_auto();
  printf("-------------------------------------------------------------\n");
  printf("X@[%p] = %d within main()\n", &X, X);
  printf("-------------------------------------------------------------\n");
  foo_extern();
  printf("-------------------------------------------------------------\n");
  printf("X@[%p] = %d within main()\n", &X, X);
  printf("-------------------------------------------------------------\n");
  return 0;
}
When you run this, you get:

Code:
-------------------------------------------------------------
X@[0000000000407970] = 100 within main()
-------------------------------------------------------------
-- foo() with no declaration inside inner block--------------
X@[000000000065FDEC] = 42 within foo()
X@[000000000065FDEC] = 43 within foo()'s block
X@[000000000065FDEC] = 43 within foo()
-------------------------------------------------------------
X@[0000000000407970] = 100 within main()
-------------------------------------------------------------
-- foo() with normal declaration inside inner block----------
X@[000000000065FDEC] = 142 within foo()
X@[000000000065FDE8] = 20 within foo()'s block
X@[000000000065FDEC] = 142 within foo()
-------------------------------------------------------------
X@[0000000000407970] = 100 within main()
-------------------------------------------------------------
-- foo() with external declaration inside inner bloack ------
X@[000000000065FDEC] = 242 within foo()
X@[0000000000407970] = 101 within foo()'s block
X@[000000000065FDEC] = 242 within foo()
-------------------------------------------------------------
X@[0000000000407970] = 101 within main()
-------------------------------------------------------------

--------------------------------
Process exited after 0.03806 seconds with return value 0
Press any key to continue . . .
Which shows that using no declaration results in X in the inner block referring to the X that is defined at the level of foo(), while declaring it without any modification results in X in the inner block referring to a variable that only exists within the inner block and is NOT the same as the one defined at the level of foo(), and when it is declared as 'extern' it refers to the X that is defined with file-scope.
 

Thread Starter

King2

Joined Jul 17, 2022
163
You have a variable, X, that has file scope and therefore has external linkage. Every variable with external linkage (within a program) refers to the same variable and the definition of that variable can only appear once.

Thus, any extern modifier you use will refer to your externally-linked instance of that identifier.
Thank you both of you. I appreciate your help.

What is the meaning of linkage in c standard?

I think it is a challenge for a compiler to know where the variable is declared. I think the linker can tell whether the variable is in the same scope, or it is in different scope or declared in another source file.
 

WBahn

Joined Mar 31, 2012
29,978
In most programming language context, "linkage" means binding an identifier to a storage location.

In C, there are three kinds of linkage: external, internal, and none. All declaration of a particular identifier having external linkage refer to the same memory location. Within a given file, all declarations of a particular identifier having internal linkage refer to the same memory location. All declarations with no linkage refer to distinct memory locations.

Typically, when a file is compiled information about the type of linkage and the identifier for each declarations is noted in a symbol table that is made part of the object code file. When these files are linked by the linker, that information is used to resolve the references in one file that need to refer to definitions (as opposed to declarations) in another file.
 

WBahn

Joined Mar 31, 2012
29,978
This is because X is both a global variable defined outside of any function AND it is also a local variable to function foo(). Doing things this way can be confusing and you should avoid doing this until your knowledge level increases.
I'd go further than that -- avoid doing this, period. If you ever do, be sure you have a damn good reason and document it thoroughly.

Just because you have the extern statement inside foo() does not mean that the locacal variable X does not take precedence.
Actually, it does. X is declared as an extern within a compound statement and that shadows any declaration made in any enclosing context. Now, if this compound statement had another compound statement enclosed in it that declared an automatic variable of the same name, then THAT would shadow the external variable (and any others) in any enclosing contexts.
 

Thread Starter

King2

Joined Jul 17, 2022
163
All declaration of a particular identifier having external linkage refer to the same memory location, Within a given file,

all declarations of a particular identifier having internal linkage refer to the same memory location..

All declarations with no linkage refer to distinct memory locations.
The concept of linkage in c language seems a bit difficult to understand. please explain it a bit more

A project can contain multiple files and variables can be declared in same file or in anywhere.
 

Papabravo

Joined Feb 24, 2006
21,159
The concept of linkage in c language seems a bit difficult to understand. please explain it a bit more

A project can contain multiple files and variables can be declared in same file or in anywhere.
A simple example to refer to might be helpful. The concept of linkage and linkage editors arose the first time somebody decided to spit a single program across two different and separately compiled files. At the beginning of software time all programs had been confined to a single compilation unit (file, deck of punch cards, or punched paper tape). All addresses could be allocated and assigned at compile time. Once you were allowed to split a program across two compilation units, a new post processing step was required to resolve the location of variables defined in one of the compilation units but used in the other. The following rules are necessary for the linkage process to be completed successfully:

  1. Any variable can be DEFINED, and have space reserved for it, in one and only one of the compilation units.
  2. To allow the use of a DEFINED variable in another unit it must be marked as GLOBAL in the unit where it is defined.
  3. A GLOBAL variable can be used in the compilation unit in which it is defined.
  4. To use a variable DEFINED in another unit it must be marked as EXTERN
The words that I capitalized are generic concepts that may be represented by other keywords and semantics in different languages.
 
Last edited:

Thread Starter

King2

Joined Jul 17, 2022
163
A simple example to refer to might be helpful.
Will declaring the variable know what kind of linkage it is?

If a local variable is declared in the code then what type of linkage is it ? No linkage

If a global variable is declared in the code then what type of linkage is it ? Internal linkage

If a local static variable is declared in the code then what type of linkage is it ?

If a global static variable is declared in the code then what type of linkage is it ?

If a extern variable is declared in the one file then what type of linkage is it ? External linkage
 
Last edited:

Papabravo

Joined Feb 24, 2006
21,159
Will declaring the variable know what kind of linkage it is?

If a local variable is declared in the code then what type of linkage is it ? No linkage

If a global variable is declared in the code then what type of linkage is it ? Internal linkage

If a local static variable is declared in the code then what type of linkage is it ?

If a global static variable is declared in the code then what type of linkage is it ?

If a extern variable is declared in the one file then what type of linkage is it ? External linkage
I'm not sure I have precise definitions for the terms you are using, but I will try my best.

Will declaring the variable know what kind of linkage it is? YES

If a local variable is declared in the code then what type of linkage is it ? A local variable, defined inside a function, is allocated on the stack, at a fixed offset from the beginning of the function's "stack frame". As such the linkage editor does not need to worry because the variable storage is created and released at runtime when the function is called and when it returns.

If a global variable is declared in the code then what type of linkage is it ? A global variable defined in a compilation unit is assigned space in the data segment for that compilation unit. The offset from the beginning of the data segment is noted in the compiler's symbol table. A subsequent compilation unit with the variable name declared as EXTERN can access the same data segment location and offset as was assigned in the module where the variable was defined.

If a local static variable is declared in the code then what type of linkage is it ? A static variable defined in the particular compilation unit is allocated space in the data segment for that compilation unit and is accessible only to functions include in the compilation unit. Access is via an offset to the data segment for the compilation unit.

If a global static variable is declared in the code then what type of linkage is it ? A global static variable is assigned an offset in the data segment for the compilation unit. It can be accessed from function in other compilation units if defined in those units via an EXTERN statement.

If a extern variable is declared in the one file then what type of linkage is it ? You cannot define a variable as both GLOBAL and EXTERN in a single file. You can define it as GLOBAL, but for a single file the linker has nothing to do.

External linkage is what happens when you have a definition in one compilation unit and a reference in another compilation unit. The unit with the definition knows the offset in the data segment for that unit but does not know the absolute address of the variable at run time. The unit with the EXTERN statement knows neither the offset nor the address of the data segment of the variable location. This is partially resolved by the Linkage Editor and can be further modified by the LOADER which maps the executable into physical memory. There may also be virtual memory mapping hardware that further obscures where data is physically located.
 

WBahn

Joined Mar 31, 2012
29,978
Will declaring the variable know what kind of linkage it is?
The compiler will know the linkage of the variable when either declaring or defining it (provided the code is valid).

The exception is if an identifier has both internal and external linkage in the same translation unit. In this case, the behavior is undefined. Many implementations will have the identifier with internal linkage shadow the one with external linkage, but this should not be relied upon.

If a local variable is declared in the code then what type of linkage is it ? No linkage
Correct. This will be an automatic variable. Most implementations will allocate this on the stack within the function's stack frame (and each call to the function will have its own stack frame). However, this is not absolutely required -- the language specification says nothing about stacks (the word "stack" literally does not appear anywhere within the C99 standard).

If a global variable is declared in the code then what type of linkage is it ? Internal linkage
There is no "global" variable declaration and throwing around the word "global" is asking for trouble. A C program, in essence, has two types of "global" variables. It has variables with external linkage, which are accessible from any code within the program (unless they are shadowed), and variables with internal linkage, which are only visible within the single file in which they are defined. For variables outside of any function, if you define them, then they they have external linkage. If you use the 'static' modifier, then they have internal linkage. If you declare them with the 'extern' modifier, then they have external linkage (but are only declared, not defined).

If a local static variable is declared in the code then what type of linkage is it ?
By "local", I'm going to assume you mean a static variable that is defined within a compound statement (usually a function body).

In this case, the linkage is none.

If a global static variable is declared in the code then what type of linkage is it ?
If by "global" you mean defined outside of any function (and thus having file-scope), then it has external linkage UNLESS the 'static' modifier is used, in which case it has internal storage.

If a extern variable is declared in the one file then what type of linkage is it ? External linkage
What is "the one file"? You mean if your program consists of only a single file? It still has external linkage. However, if it is ONLY declared in that file and not defined, then there will be a linker error because no definition will be found at link time.



The mechanics and semantics vary from language to language. In C, a variable can have one of three linkages: external, internal, or none.

C does not have an explicit means of declaring a variable to be global (i.e., there's no "global" keyword or equivalent). It is done implicitly by how you declare/define the object. This is as good a time as any to point out that one of the major criticisms of C is that its namespace management sucks.

It is important to remember the distinction between declaring and defining. When you declare an object, you are merely telling the compiler that an object with that identifier exists (or will exist, as the case may be). When you define an object, you are allocating memory for it and binding that identifier to that memory location (which may be deferred until runtime). An object can be declared many times, but can be defined at most once. If a declared object is never used, then it does not necessarily have to be defined.

Consider the following program, consisting of three files:

Code:
// ==================================
// main.c
// ==================================

#include <stdio.h>
#include <stdlib.h>

// These prototypes would normally be in header files
void f_fred(void);
void f_sue(void);

int fred;
extern int sue;

int main(void) 
{
    f_fred();
    f_sue();
    printf("[%p] %10i main:fred\n", &fred, fred);
    printf("[%p] %10i main:sue\n", &sue, sue);
    return 0;
}

// ==================================
// fred.c
// ==================================

#include <stdio.h>

extern int fred;

void f_fred(void)
{
    fred += 10;
    printf("[%p] %10i f_fred:fred\n", &fred, fred);
}

// ==================================
// sue.c
// ==================================

#include <stdio.h>

int sue;

void f_sue(void)
{
    sue += 10;
    printf("[%p] %10i f_sue:sue\n", &sue, sue);
}
In main, the variable fred is defined with, implicitly, external linkage. The variable sue, on the other hand, is declared (not defined) with, explicitly, external linkage.

No other file in the program may attempt to define fred with either external (though external declarations are fine). However, because the variable sue is used in the code (last printf() statement) the variable sue MUST be defined (not declared) once as having external linkage one time in some file in the program.

This compiles with with no errors or warnings. producing:

Code:
[0000000000407970]         10 f_fred:fred
[0000000000407974]         10 f_sue:sue
[0000000000407970]         10 main:fred
[0000000000407974]         10 main:sue
As can be seen by the addresses, both instances of fred and sue point to a single memory location.

If we change the declaration in fred.c from

extern int fred; // external declaration

to

static int fred; // internal definition

and run it we again get no errors or warnings, but the results are different:

Code:
[0000000000407030]         10 f_fred:fred
[0000000000407994]         10 f_sue:sue
[0000000000407990]          0 main:fred
[0000000000407994]         10 main:sue
Now we can see that fred within fred.c is NOT the same as the fred in main.c. It has file scope within fred.c, meaning that all functions within this file can see and access it, but functions outside this file can't.

There is no way (except by passing pointers and dereferencing them) for code anywhere within the program to access both of these variables. The code within fred.c can see the static variable defined there while the code in all other files can, at most, see the fred defined in main.c.

So consider the following modifications to the program:

Code:
// ==================================
// main.c
// ==================================

#include <stdio.h>
#include <stdlib.h>

// These prototypes would normally be in header files
void f_fred(void);
void f_fred2(void);
void f_fred3(void);
void f_sue(void);

int fred = 12;
extern int sue;

int main(void) 
{
    f_fred();
    f_fred2();
    f_fred3();
    f_sue();
    printf("[%p] %10i main:fred\n", &fred, fred);
    printf("[%p] %10i main:sue\n", &sue, sue);
    return 0;
}

// ==================================
// fred.c
// ==================================

#include <stdio.h>

static int fred = 42;

void f_fred(void)
{
    extern int fred;
    fred += 10;
    printf("[%p] %10i f_fred:fred\n", &fred, fred);
}

void f_fred2(void)
{
    static int fred = 142;
    fred += 10;
    printf("[%p] %10i f_fred2:fred\n", &fred, fred);
}

void f_fred3(void)
{
    fred += 10;
    printf("[%p] %10i f_fred3:fred\n", &fred, fred);
}

// ==================================
// sue.c
// ==================================

#include <stdio.h>

int sue;

void f_sue(void)
{
    sue += 10;
    printf("[%p] %10i f_sue:sue\n", &sue, sue);
}
Which "fred" is being used in each of the three functions defined within fred.c?

Try to figure it out without running it, then run it and see if you are correct.
 
Top