C -> Assembly

The source can be compiled into assembly using gcc -S -masm=intel hello.c -o hello_asm.s .

Previously, we've learned that void main(); is a wrong signature for main function. But if we compile the code with void main(); signature, we get an almost similar assembly.

C_source_to_assembly.asm

    .file     "hello.c"
    .intel_syntax noprefix
    .text
    .section  .rodata
.LC0:
    .string   "Hello, World!"
    .text
    .globl    main
    .type     main, @function
main:
.LFB0:
    .cfi_startproc
    push rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    mov	rbp, rsp
    .cfi_def_cfa_register 6
    lea	rax, .LC0[rip]
    mov	rdi, rax
    call     puts@PLT
    nop
    pop	rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size	main, .-main
    .ident	"GCC: (Debian 14.2.0-19) 14.2.0"
    .section	.note.GNU-stack,"",@progbits

And this is the assembly generated for int main(void); signature.

C_source_to_assembly.asm

    .file     "hello.c"
    .intel_syntax noprefix
    .text
    .section  .rodata
.LC0:
    .string   "Hello, World!"
    .text
    .globl    main
    .type     main, @function
main:
.LFB0:
    .cfi_startproc
    push rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    mov	rbp, rsp
    .cfi_def_cfa_register 6
    lea	rax, .LC0[rip]
    mov	rdi, rax
    call     puts@PLT
    mov eax, 0
    pop	rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size	main, .-main
    .ident	"GCC: (Debian 14.2.0-19) 14.2.0"
    .section	.note.GNU-stack,"",@progbits

It's clear that only line 21 is different, from nop to mov eax, 0.

nop translates to no operation. We need not to think about it right now.
In the other one, we are zeroing the accumulator to pass the exit code to the next sequence in the pipeline.

Lets understand this assembly now.

.file is an assembler (GAS) directive to make the file name available to the binary.
.intel_syntax noprefix is a GAS specific directive, which is specified to use intel style assembly.
.text marks the start of code section.
.rodata marks the start of a read-only section.
.LC0 is a label for a literal constant.
.string is used to define null-terminated, C-style string literals. This is where our "Hello, World\n" goes.
.globl makes a symbol visible to the linker.
.type main, @function specifies that main is a function symbol.
.LFB0 stands for local function begin label, used internally by GCC for debugging info.
push rbp pushes the base pointer of current frame on stack.
mov rbp, rsp sets up a new stack frame.
lea rax, .LC0[rip] uses RIP-relative addressing to load the address of "Hello, World!\0" string.
mov rdi, rax moves the address of the string into destination index register.
call puts@plt calls the puts function in glibc via procedure linkage table (plt).
- We'll expand on it further.
mov eax, 0 zeroes the accumulator to send as exit code.
pop rbp pops the current base address.
ret return.

If you are not learning passively, you are definitely wondering what about .cfi_* directives?

It stands for call frame information directives.
The compiler generates them and they are primarily used by debuggers (like gdb) and exception handlers.
I have seen glimpses of gdb and it feels like magic. But the basis of that magic comes from here.
Later on, when we'll use gdb, we will revisit them.

There is something that is missing here. Can you spot it?

There is no exit syscall. There is nothing like
```
mov rax, 60
xor rdi, rdi
syscall
```
Exit is never controlled by our source code.
- When we write raw assembly, we manage exit ourselves. When we use shared libraries, we are using a complete infrastructure to run something. Now it is the duty of the infrastructure (or engine you can say) to take care of all this.
- In the upcoming articles, we will find that there is so much that goes before our source code gets executed and there is so much that comes after it is executed, we'll understand how tiny our source code really is.

If we visit https://godbolt.org/ and paste our source code there, we can find that the assembly generated there is very different. Something like this:

asm_from_godbolt_org.asm

.LC0:
    .string "Hello, World!"
main:
    push  rbp
    mov   rbp, rsp
    mov   edi, OFFSET FLAT:.LC0
    call  puts
    mov   eax, 0
    pop   rbp
    ret

In the right section, where the assembly part is displayed, you can find a clickable link to Libraries. Above that is green tick. Click on that and you will find that different options are passed to the compiler to optimize the command.

It is the result of those compiler options that we see such a simple and stripped away assembly.

By default, there is no optimization done on the code, which is why it is lengthy and readable. Shortening (optimizing) the code would result in less readability, which is not good for us, who are trying to understand things.

And we will stick to no optimization, at least for this binary.

And we have walked the first step. This marks the end of understanding assembly.

Now we will move to object code.

PreviousMacro Level Roadmap NextObject Code Analysis

Last updated 3 days ago