Magic Verification API

Problem Statement

The first step in parsing an ELF is to verify if the file loaded is an ELF or not.

How can we do that? By verifying the magic bytes (or numbers) in the file headers.

What are magic numbers?

Magic numbers can take multiple forms in computer programming. In our case, it is a constant stream of characters, used to identify a file format.

While statically analyzing the hello world binary, we had started by analyzing the file headers. The readelf output started like this:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00

The first 4 hexadecimal numbers represent the magic number.

But readelf refers all the pairs as magic?

Actually, only the first 4 pairs form the magic number. Rest are the values in the e_ident[] array, which is a part of the file headers.

Remember, each pair is hexadecimal so they are 1 byte in size.

Does this magic number hold any meaning?

Yes. 0x7f = DEL, 0x45 = E, 0x4c = L and 0x46 = F

Why `0x7f` ?

A random text file may start with 45 4c 46 bytes, but it is very unlikely for a text file to start with 7f 45 4c 46 . That is why a random character is used.

Reading The Magic

To mark a file as a valid ELF, the first 4 bytes must be 0x7F 0x45 0x4C 0x46 . We will use fread() to read those bytes.

It is provided by the C standard I/O library (stdio.h). The signature of fread() is as follows.

// General Signature
fread(dest_ptr, size_each_element, n_elements, file_ptr);

// From Manual
size_t fread(void ptr[restrict .size * .nmemb], size_t size, size_t nmemb, FILE *restrict stream);

fread requires 4 arguments.

dest_ptr is where the raw bytes would be stored after extraction.
n_elements is the total number of elements we are extracting.
size_each_element is self-explanatory.
file_ptr has access to the file's raw bytes. This is how we are going to read the file.

It reads N number of elements each of size S from the file pointer and stores them in the destination pointer.

Return Value

If reading was successful, it returns the number of items read. This is what we to verify if fread was successful or not.

If you wonder why fread needs size and count of entries, instead of just byte count, remember, C is statically-typed and fread allows you to abstract away the complexity of parsing and interpreting raw bytes according to some data type. If you want to deal with raw bytes directly, use the UNIX system call API read .

If you are still unsure about it, and want to dive deep into it, I've written a short detour here.

For more information on fread, visit its man page.

man fread
man7 online

Verifying The Magic Bytes

#include <stdio.h>
#include "verify_elf.h"

int verify_elf(FILE* file_object){
  unsigned char magic_number[4];

  if (fread(&magic_bytes, 1, 4, file_object) != 4) {
    fprintf(stderr, "Error: `fread()`: Unable to read ELF magic bytes.\n");
    fclose(file_object);
    return -1;
  }

  if (magic_number[0] != 0x7f || magic_number[1] != 'E' || magic_number[2] != 'L' || magic_number[3] != 'F'){
    fprintf(stderr, "Error: Unexpected magic bytes returned.\n  Expected: `0x7F, E, L, F`\n  Found: %c, %c, %c, %c\n", magic_number[0], magic_number[1], magic_number[2], magic_number[3]);
    fclose(file_object);
    return -1;
  }

  return 0;
}

Why unsigned char?

char is 1-byte, so it is appropriate to store magic bytes. But char can be signed (-128 to 127) or unsigned (0-255).

We have to make sure that it is unsigned because the magic bytes for an ELF are unsigned.

We are passing a reference of the magic_number array because fread expects a pointer.

We are reading 4 elements, each of size 1-byte.

And we are done. But there are two more questions I would like to ask here.

Why we are not using `read`?

If you have taken a C tutorial, you know that in the file I/O section, we use the read() and write() functions to operate on files.

fread belongs to the standard I/O library. It is designed to aid in file operation.

read , on the other hand, belong to the UNIX standard library, or unistd.h . It provides access to the POSIX operating system API.

The signature for read is:

ssize_t read(int fd, void *buf, size_t count);

In simplest terms, fread provides a healthy abstraction where we don't have to manage extraction of bytes and their interpretation. Otherwise, if we chose to go with read, we have to manage all that ourselves. Plus, fread is portable as well and read is UNIX-dependent.

Why we are using `fprint` when we have `printf`?

In assembly, the exec syscall is the main one and execve, execvp are just wrappers around it. The same is with *printf* functions, just remove the idea of syscall.

If we open the man 3 entry for printf, we can find a big list of *printf* function.

printf,   fprintf,  dprintf,  sprintf, 
snprintf, vprintf,  vfprintf, vdprintf, 
vsprintf, vsnprintf

Why man 3?

The 3 corresponds to library calls (functions in program libraries) section. If you do man printf, you will find the entry for printf as a user command.

Only the v* variants of printf are the real workers. Everything else is just a wrapper around them.

For example, printf is a shorthand for fprintf(stdout, const char*, format); .

Clearly, fprintf lets us choose the output stream. It can be stdout, stderr or even a FILE* . We are using stderr as it keeps errors separate from the standard output and they can be directed elsewhere.

Conclusion

And we have managed to verify if the file passed to our program is an ELF or not.

Next, we will parse the file headers.

PreviousProject Design NextWhy A `.c` and `.h` File?

Last updated 5 days ago