Relocation - Part 1
Setup
You remember the dynamic section?
Dynamic section at offset 0x2de0 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x1000
0x000000000000000d (FINI) 0x1154
0x0000000000000019 (INIT_ARRAY) 0x3dd0
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x3dd8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x3b0
0x0000000000000005 (STRTAB) 0x480
0x0000000000000006 (SYMTAB) 0x3d8
0x000000000000000a (STRSZ) 141 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x3fe8
0x0000000000000002 (PLTRELSZ) 24 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x610
0x0000000000000007 (RELA) 0x550
0x0000000000000008 (RELASZ) 192 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x000000006ffffffe (VERNEED) 0x520
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x50e
0x000000006ffffff9 (RELACOUNT) 3
0x0000000000000000 (NULL) 0x0
We can classify the entries in the Type
field based on when the interpreter goes to them.
1
NEEDED
Load shared libraries
2
JMPREL, RELA
Relocation Time
-
GNU_HASH, STRTAB, SYMTAB
Lookup data for relocation
-
STRSZ, SYMENT, DEBUG, PLTGOT, PLTRELSZ, PLTREL, RELASZ, RELAENT
Metadata for relocation
-
FLAGS_1, VERNEED, VERNEEDNUM, VERSYM, RELACOUNT
Some more info
3
INIT, INIT_ARRAY, INIT_ARRAYSZ
Setup before main
-
FINI, FINI_ARRAY, FINI_ARRAYSZ
Setup after main
4
NULL
End
So far, we know that the first thing that the interpreter does is to load the shared libraries.
In our case, it is
libc.so.6
.
After all the shared libraries are loaded, the interpreter goes about relocation.
We know that we have two relocation tables in our binary. Also, there are two entries in the dynamic section regarding relocation, these include JMPREL
and RELA
. The question is, which one the interpreter is going to go at first?
To find the answer to this question, we have to understand a simple concept, called binding.
Binding
Binding refers to process of resolving the actual address of a symbol. Although binding can be of various types, we are concerned about two types for now.
Eager Binding: Here, the symbol is resolved immediately.
Lazy Binding: Here, the symbol is resolved only when it is required for the first time.
Every entry in the .rela.dyn
table undergoes eager binding because it includes symbols which are primarily used in the startup code, the code which gets run before the main()
of our source code.
Every entry in the .rela.plt
table undergoes lazy binding because it includes symbols which are primarily used in the actual source code of the binary.
But why does lazy binding exist?
Lets have a look at the following code.
#include<stdio.h>
#include<unistd.h>
int main(void){
int input;
printf("Enter 0 to exit OR 1 to sleep: ");
scanf("%d", &input);
if (input==0){
printf("Exiting.....\n");
}
else{
sleep(10);
printf("Sleeping for 10 seconds.....\n");
printf("Exiting.....\n");
}
}
Now, it entirely depends upon the user input whether sleep function has to be execute or not.
If you resolve it before it is used, it will create a very minute delay as the startup code will execute late. But it is unnoticeable.
Big code bases have thousands of such conditions. I hope the delay would be significant there?
Lets have a look at this another piece of code.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int input = atoi(argv[1]);
if (input == 0) {
return 0;
}
else {
sleep(10);
printf("Sleeping for 10 seconds.....\n");
printf("Exiting.....\n");
}
return 0;
}
This one is slightly modified. It takes input as an argument.
Here, even
printf()
's usage is dependent on the argument completely.What if the user input 0?
printf()
would never run, right? Then what is the point of resolving its address?
Lazy binding is a solution to this problem.
Note: The address is resolved when the symbol is referenced for the first time. After that, no more resolution is required.
There also exist a very genuine question that big code bases have too many functions, and if the address for every symbol in the source code is resolved like this, isn't this going to create a runtime overhead?
Absolutely right.
I mentioned earlier that there exist multiple kinds of binding, but we are concerned about two for now. Lazy binding is a solution, not the only solution. And as we will advance we will learn about them later as well.
But keep this in mind that this binary only requires the knowledge of these two. So, we are not going to touch them soon.
I guess we have find the answer to our question. The interpreter would go to RELA
entries first because they require eager binding. So, what we are waiting for?
Introducing Relocations
The interpreter uses the RELA
entry to jump to the .rela.dyn
relocation table. RELA
entry in the dynamic section has a value of 0x550
and we can verify that the .rela.dyn
table is also located at the same offset by this line Relocation section '.rela.dyn' at offset 0x550 contains 8 entries:
.
To revise, a relocation entry can be read as: at offset in the section, replace the placeholder address of the symbol with its actual address.
These are the relocation entries in the .rela.dyn
table.
Relocation section '.rela.dyn' at offset 0x550 contains 8 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000003dd0 000000000008 R_X86_64_RELATIVE 1130
000000003dd8 000000000008 R_X86_64_RELATIVE 10f0
000000004010 000000000008 R_X86_64_RELATIVE 4010
000000003fc0 000100000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.34 + 0
000000003fc8 000200000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTM[...] + 0
000000003fd0 000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000003fd8 000500000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCl[...] + 0
000000003fe0 000600000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0
The Info
field is 8 bytes long, although here it is 6 bytes, which is I am also wondering why readelf is not showing the remaining 2 bytes, but leave it.
The upper 8-bytes refers to the symbol index value and the lower 8-bytes refers to the relocation type.
`R_X86_64_RELATIVE` Relocation
Offset Info Type Sym. Value Sym. Name + Addend
000000003dd0 000000000008 R_X86_64_RELATIVE 1130
Its symbol index is 0 and the relocation type is 8, which resolves to R_X86_64_RELATIVE
.
Entries of type 8 doesn't require any symbol lookup.
Offset
is where we have to write the result. And the result is calculated as follows:*(Offset) = Base Address of the binary + Addend
In simple words, take the base address of the binary and add the value in the
Sym. Name + Addend
field to it. Now write the obtained value at the mentioned offset. Relocation is done.relocation_address = base_addr + 0x3dd0; value_to_write = base_addr + 1130; *(relocation_address) = value_to_write
The remaining two entries are relocated in the same manner.
000000003dd8 000000000008 R_X86_64_RELATIVE 10f0
000000004010 000000000008 R_X86_64_RELATIVE 4010
`R_X86_64_GLOB_DAT` Relocation
Offset Info Type Sym. Value Sym. Name + Addend
000000003fc0 000100000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.34 + 0
000000003fc8 000200000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTM[...] + 0
000000003fd0 000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000003fd8 000500000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCl[...] + 0
000000003fe0 000600000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0
Lets take the first entry here.
The symbol index is 1 and relocation type is 6.
R_X86_64_GLOB_DAT
does require symbol lookup.This is a global data relocation, commonly used for symbol pointers in the GOT (Global Offset Table). Its purpose is to fill a pointer with the runtime address of a symbol — typically function pointers or global variables imported from shared libraries.
The symbol is __libc_start_main.
The relocation logic is
*(base_addr + 0x3fc0) = address_of(__libc_start_main)
The symbol is looked up in all the loaded shared libraries using the dynamic symbol table and the symbol hash tables. Once the runtime address is found in memory, the address is written in place of 0x3fc0
in the global offset table. And the relocation is done.
The term global offset table is new here.
We are done with .rela.dyn
relocations.
Now the interpreter jumps to the JUMPREL
entry in the dynamic section and finds .rela.plt
. The real chaos starts here.
PLT entries are about lazy binding, by default.
For lazy binding, we need to understand global offset table (GOT) and procedure linkage table (PLT). Both of which are really complex and confusing.
Since it is fairly long, it deserves its own separate place. Therefore, we are dividing this article into two parts.
Here ends the first part.
Last updated