This article discusses linking only between relocatable object files.  C is used as the programming language in discussions and examples.

I use GCC 4.8.5 in all illustrations.

What are COMMON symbols?

Common symbols are a feature that allow a programmer to ‘define’ several variables of the same name in different source files.  This is in contrast with the more popular way of doing, where you define a variable once in a source file, and reference it everywhere else in other source files, using extern.  When common symbols are used, the linker will merge all symbols of the same name into a single memory location, the size of which is the largest type of the individual common symbol definitions.  For example, if fileA.c defines an uninitialized 32-bit integer myint, and fileB.c defines an 8-bit char myint, then in the final executable, references to myint from both files will point to the same memory location (common location), and the linker will reserve 32 bits for that location.

COMMON symbols are contained only in relocatable object files, and not in executable object files.  They are generated by the compiler/assembler when creating an object file from a single source file.  Later, the linker will need to interpret these symbols.  Remember that ELF reserves a special section header table index for referring to a ‘COMMON’ section: the index COM (just like we have special indices ABS and UND, and where those sections do not physically exist in the file).  Therefore common symbols defined in the symbol table of relocatable object files have their section index member set to COM.

From what i gather from the internet, common symbols are present only for backward-compatibility with old source files that did not use extern.  The best, nowadays, is to make use of only one definition of a variable, and use extern in all other source files that reference it.  Actually common symbols first appeared as a feature of the FORTRAN language.

Code generation by the compiler

Normally, when we have a global uninitialized variable in a C source file, when we compile it we would expect the variable to go to the .bss section in the relocatable object file.  However, by default, GCC will put the symbol in the COMMON section of the file; that is, the option -fcommon is the default behaviour.

$> cat main.c
int un_a;
int main() {
     return 0;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main

$> gcc -c -o main.o main.c -fno-common
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main

From the above we confirm that by default, the uninitialized variable un_a is put in the common section.  If, however, we compile the same source file with -fno-common, we see that the section of un_a is now at index 3, which is the .bss section of the main.o relocatable file.

Of course, if the variable is initialized to a certain value, then it is placed in the .data section in the output file by the compiler (actually the assembler).  This is illustrated below (note that section index 2 corresponds to the .data section here: use -SW options to readelf to list all sections).

$> cat main
int un_a=9;
int main() {
     return 0;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o.c
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main

If we define a global variable, and explicitly initialise it to zero, then it will be put in the .bss section (although it is ‘initialized’ and logically should go into .data, the compiler knows it is optimal to put it in the .bss, as in any case it will become initialized to zero at runtime, and in .bss will not consume file space).

$> cat main.c
int un_a=0;
int main() {
     return 0;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main

Let us quickly confirm what happens when we have an extern declaration for a variable.

$> cat main.c
extern int un_a;
int main() {
     int a = un_a;     
     return 0;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     8: 0000000000000000    20 FUNC    GLOBAL DEFAULT    1 main
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND un_a

What does all of the above imply?  Well, it becomes interesting to see how the linker will respond to different situations, when the same symbol name is defined in multiple relocatable object files.  We investigate this with a series of illustrations below.  Note how we pointed out earlier that common symbols are present only in relocatable objects, and not in executables.  So the linker will need to decide where it will place that COMMON symbol finally in the executable: either in the .data or the .bss.

To see how the linker responds to different scenarios, we will use two source code files in the following examples.  Both define the global variable un_a.  We will try to list maximum possible scenarios, and then perform tests to see what happens.  Remember, that the goal of everything that we are doing here is to deepen understanding, so at each conclusion that we can make, we must ask ourselves why the implementors of the linker chose to do it that way.

We must remember that the linker sees only the object files, although it is easier for us to talk in terms of source code to explain certain things.

Below we will examine what the linker outputs in the executable, when linking is done in the following cases:

  1.     Two object files defining the same symbol, one of them a COMMON, and the other located in the .bss section
  2.     Two object files defining the same symbol, one of them a COMMON, and the other an underfined symbol
  3.     Two object files defining the same symbol, both of them COMMON
  4.     Two object files defining the same symbol, one in .bss, and the other in .data
  5.     Two object files defining the same symbol, one in .data, and the other an undefined symbol

We will examine at most two objects files as input to the linker, as it will be easier to explain and understand.  With more objects it is the same concept.

1     Two object files defining the same symbol, one of them a COMMON, and the other located in the .bss section

$> cat main.c
int un_a;
int main() {
     return 0;
}
$> cat swap.c
int un_a=0;
int swap() {
     return 108;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main
$> gcc -c -o swap.o swap.c
$> readelf -s swap.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS swap.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 swap
$> gcc -o prog main.o swap.o
$> readelf -s prog | grep 'un_a'
49: 0000000000601030     4 OBJECT  GLOBAL DEFAULT   25 un_a

In the example above, we see that the linker creates the final variable ‘un_a’ in the .bss section of the executable object file (index 25 is the index of .bss as shown by readelf -SW, the output of which is not shown here to save space).

In the case where the file swap.c had initialised the variable to a non-zero value, the variable would have been in the .data of the relocatable swap.o, and the linker would have then placed it in the .data section of the executable.

2     Two object files defining the same symbol, one of them a COMMON, and the other an undefined symbol

$> cat main.c
int un_a;
int main() {
     return 0;
}
$> cat swap.c
extern int un_a;
int swap() {
     int a = un_a;
     return 108;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main
$> gcc -c -o swap.o swap.c
$> readelf -s swap.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS swap.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     8: 0000000000000000    20 FUNC    GLOBAL DEFAULT    1 swap
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND un_a
$> gcc -o prog main.o swap.o
$> readelf -s prog | grep 'un_a'
49: 0000000000601030     4 OBJECT  GLOBAL DEFAULT   25 un_a

As we see, in the final executable, our variable is located in the .bss section.

3     Two object files defining the same symbol, both of them COMMON

$> cat main.c
int un_a;
int main() {
     return 0;
}
$> cat swap.c
int un_a;
int swap() {
     return 108;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main
$> gcc -c -o swap.o swap.c
$> readelf -s swap.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS swap.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 swap
$> gcc -o prog main.o swap.o
$> readelf -s prog | grep 'un_a'
49: 0000000000601030     4 OBJECT  GLOBAL DEFAULT   25 un_a

We see here also, the variable is located in .bss of final executable.

4     Two object files defining the same symbol, one in .bss, and the other in .data

$> cat main.c
int un_a=0;
int main() {
     return 0;
}
$> cat swap.c
int un_a=9;
int swap() {
     return 108;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main
$> gcc -c -o swap.o swap.c
$> readelf -s swap.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS swap.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 swap
$> gcc -o prog main.o swap.o
swap.o:(.data+0x0): multiple definition of `un_a'
main.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status

As we see, here we get a linking error.  This makes sense if we think about it.  The codes from both files each reference their own data location, initialized differently.  How is the linker to know which initialization to take?  If it takes the value 9 and puts it in the .data section for symbol un_a, then the code in main.c may behave incorrectly as it was written assuming the initial value of un_a to be 0.  Likewise, if it were to choose to put un_a in .bss, such that at runtime the initial value of un_a will be 0, the code in main.c will work correctly as it expects indeed an initial value of 0 there, but code from swap.c will likely not function well.

5     Two object files defining the same symbol, one in .data, and the other an undefined symbol

$> cat main.c
int un_a=10;
int main() {
     return 0;
}
$> cat swap.c
extern int un_a;
int swap() {
     int a = un_a;
     return 108;
}
$> gcc -c -o main.o main.c
$> readelf -s main.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 un_a
     9: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 main
$> gcc -c -o swap.o swap.c
$> readelf -s swap.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS swap.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     8: 0000000000000000    20 FUNC    GLOBAL DEFAULT    1 swap
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND un_a
$> gcc -o prog main.o swap.o
49: 000000000060102c     4 OBJECT  GLOBAL DEFAULT   24 un_a

This example shows a normal and common case.  We see that finally the variable is defined in the .data of the executable (section index 24).

Advertisements