Hello world program: The assembly

First we need to write the hello world C program, which can be seen below: [cpp] #include <stdio.h> int main() { printf(“Hello World!”); return 0; } [/cpp] It’s a very simple program that doesn’t actually do anything; we intentionally kept it this simple, so we will be able to focus on the bigger picture and not tons of code. We then need to compile the program to obtain the assembly code – we don’t want to do anything else right now. To do that we can use the -S option passed to the gcc program, which takes the source code of the program and generates the assembly instructions. We also want the masm Intel assembly source code and not some other format. We can achieve that by passing the -masm=Intel to the gcc program. If we’re on the 64-bit operating system, we also want to compile the program as 32-bit, which we can achieve by passing the -m32 argument to the gcc program. The whole gcc command that we’re using can be seen in the output below: [bash] # gcc -m32 -masm=intel -S hello.c -o hello.s [/bash] This command effectively takes the hello.c program and compiles it as 32-bit program into assembly instructions that are saved into the hello.s file. The hello.s file now looks like presented below: [plain] .file “hello.c” .intel_syntax noprefix .section .rodata .LC0: .string “Hello World!” .text .globl main .type main, @function main: push ebp mov ebp, esp and esp, -16 sub esp, 16 mov eax, OFFSET FLAT:.LC0 mov DWORD PTR [esp], eax call printf mov eax, 0 leave ret .size main, .-main .ident “GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4″ .section .note.GNU-stack,””,@progbits [/plain] The .file directive states the original source file name that is normally used by debuggers. The .intel_syntax line specifies that we’re using intel sytax assembly and not AT&T syntax. Afterwards we’re defining the .rodata section, which is used for read-only data variables. In our case the .rodata section contains only the zero terminated string “Hello World!” that can be accessed with the LC0 variable. Then we’re defining the .text section, which is used for the code of the program. First we must define the main function (notice the .type main,@function instruction), which is globally visible (notice the .globl main instruction). From the main: label till the ret instruction is the actual code of the program. That code first initializes the stack by pushing the value of the register EBP to the stack, moving the value of register ESP to EBP. The “and esp,-16” is used for optimization because some operations can be performed faster if the stack pointer address is in a multiple of 16 bytes. That instruction is put in there because by default, gcc uses the optimization flag -O2. Then we’re subtracting 16 bytes from the current ESP stack pointer register for local variables. Next, the address to the LC0 (our “Hello World!” string) is read into the register eax and moved to the top of the stack, which is the first and only parameter to the printf function that is called right after. The printf function prints that string on the screen and returns to the caller, which takes care of the stack and returns. The .size instruction sets the size of the main function. The .-main holds the exact size of the function main, which is written to the object file. The .ident instruction saves the ” GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4″ string to the object file in order to save the information about the compiler which was used to compile the executable.

Hello world program: The object file

We’ve seen the assembly code that was generated by the gcc directly from the corresponding C source code. But without the actual assembler and linker we can’t run the executable. To assemble the executable into the object file, we must use the -c option with the gcc compiler, which only assembles/compiles the source file, but does not actually link it. To obtain the object file from the assembly code we need to run the command below: [bash] # gcc -m32 -masm=intel -c hello.s -o hello.o # file hello.o hello.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped [/bash] We can see that the hello.o is the object file that is actually an ELF 32-bit executable, which is not linked yet. If we want to run the executable, it will fail as noted below: [bash] # chmod +x hello.o # ./hello.o bash: ./hello.o: cannot execute binary file [/bash] We can read the contents of the object file with the readelf program as follows: [plain] # readelf -a hello.o ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2’s complement, little endian Version: 1 (current) OS/ABI: UNIX – System V ABI Version: 0 Type: REL (Relocatable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 224 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 40 (bytes) Number of section headers: 11 Section header string table index: 8 Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 00000000 000034 00001d 00 AX 0 0 4 [ 2] .rel.text REL 00000000 000350 000010 08 9 1 4 [ 3] .data PROGBITS 00000000 000054 000000 00 WA 0 0 4 [ 4] .bss NOBITS 00000000 000054 000000 00 WA 0 0 4 [ 5] .rodata PROGBITS 00000000 000054 00000d 00 A 0 0 1 [ 6] .comment PROGBITS 00000000 000061 00002b 01 MS 0 0 1 [ 7] .note.GNU-stack PROGBITS 00000000 00008c 000000 00 0 0 1 [ 8] .shstrtab STRTAB 00000000 00008c 000051 00 0 0 1 [ 9] .symtab SYMTAB 00000000 000298 0000a0 10 10 8 4 [10] .strtab STRTAB 00000000 000338 000015 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) There are no section groups in this file. There are no program headers in this file. Relocation section ‘.rel.text’ at offset 0x350 contains 2 entries: Offset Info Type Sym.Value Sym. Name 0000000a 00000501 R_386_32 00000000 .rodata 00000012 00000902 R_386_PC32 00000000 printf There are no unwind sections in this file. Symbol table ‘.symtab’ contains 10 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 FILE LOCAL DEFAULT ABS hello.c 2: 00000000 0 SECTION LOCAL DEFAULT 1 3: 00000000 0 SECTION LOCAL DEFAULT 3 4: 00000000 0 SECTION LOCAL DEFAULT 4 5: 00000000 0 SECTION LOCAL DEFAULT 5 6: 00000000 0 SECTION LOCAL DEFAULT 7 7: 00000000 0 SECTION LOCAL DEFAULT 6 8: 00000000 29 FUNC GLOBAL DEFAULT 1 main 9: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf No version information found in this file. [/plain] We can see that the file is an ELF object file that has 11 section headers. The first section header is null. The second section header is .text, which contains the executable instructions of the program. The .rel.text holds the relocation information of the .text section. The relocation entries must be present, as our program instructions call external functions, whose function pointers must be updated upon the program execution. In the output above, we can see that the .rel.text holds two relocation entries: the .rodata and printf. The .data section holds the initialized data, while the .bss section holds uninitialized data that the program uses. The .rodata holds read-only data that can be used by the program; this is where our “Hello World!” string is stored. The .comment section holds version control information and the .note.GNU-stack holds some additional data that I won’t describe here. The .shstrtab holds section names, while the .strtab holds section strings and the .symtab holds the symbol table. We can quickly figure out that in the assembly code there was only the .rodata and .text sections defined, but when we translated the assembly code into the object file, quite some sections were added to the file. Those sections are needed to successfully link the executable and properly execute the program.

Hello world program: The executable

The last step is to actually link the object file to make an executable. To do that, we must execute the command below: [plain] # gcc -m32 hello.o -o hello # ./hello Hello World! [/plain] We’ve linked the object file hello.o into the executable ./hello and executed it. Upon execution of the program, the program outputted the “Hello World!” string as it should. If we take a look at the ELF again, we can see that there is a lot of other information and file sections added to the executable, which can be seen below: [plain] $ readelf -a hello ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2’s complement, little endian Version: 1 (current) OS/ABI: UNIX – System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x8048330 Start of program headers: 52 (bytes into file) Start of section headers: 4392 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 10 Size of section headers: 40 (bytes) Number of section headers: 30 Section header string table index: 27 Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048174 000174 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048188 000188 000020 00 A 0 0 4 [ 3] .hash HASH 080481a8 0001a8 000028 04 A 5 0 4 [ 4] .gnu.hash GNU_HASH 080481d0 0001d0 000020 04 A 5 0 4 [ 5] .dynsym DYNSYM 080481f0 0001f0 000050 10 A 6 1 4 [ 6] .dynstr STRTAB 08048240 000240 00004c 00 A 0 0 1 [ 7] .gnu.version VERSYM 0804828c 00028c 00000a 02 A 5 0 2 [ 8] .gnu.version_r VERNEED 08048298 000298 000020 00 A 6 1 4 [ 9] .rel.dyn REL 080482b8 0002b8 000008 08 A 5 0 4 [10] .rel.plt REL 080482c0 0002c0 000018 08 A 5 12 4 [11] .init PROGBITS 080482d8 0002d8 000017 00 AX 0 0 4 [12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16 [13] .text PROGBITS 08048330 000330 00019c 00 AX 0 0 16 [14] .fini PROGBITS 080484cc 0004cc 00001c 00 AX 0 0 4 [15] .rodata PROGBITS 080484e8 0004e8 000015 00 A 0 0 4 [16] .eh_frame_hdr PROGBITS 08048500 000500 000014 00 A 0 0 4 [17] .eh_frame PROGBITS 08048514 000514 000040 00 A 0 0 4 [18] .ctors PROGBITS 08049f0c 000f0c 000008 00 WA 0 0 4 [19] .dtors PROGBITS 08049f14 000f14 000008 00 WA 0 0 4 [20] .jcr PROGBITS 08049f1c 000f1c 000004 00 WA 0 0 4 [21] .dynamic DYNAMIC 08049f20 000f20 0000d0 08 WA 6 0 4 [22] .got PROGBITS 08049ff0 000ff0 000004 04 WA 0 0 4 [23] .got.plt PROGBITS 08049ff4 000ff4 000018 04 WA 0 0 4 [24] .data PROGBITS 0804a00c 00100c 000008 00 WA 0 0 4 [25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4 [26] .comment PROGBITS 00000000 001014 00002a 01 MS 0 0 1 [27] .shstrtab STRTAB 00000000 00103e 0000e9 00 0 0 1 [28] .symtab SYMTAB 00000000 0015d8 000340 10 29 32 4 [29] .strtab STRTAB 00000000 001918 00014d 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) There are no section groups in this file. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x00140 0x00140 R E 0x4 INTERP 0x000174 0x08048174 0x08048174 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08048000 0x08048000 0x00554 0x00554 R E 0x1000 LOAD 0x000f0c 0x08049f0c 0x08049f0c 0x00108 0x00110 RW 0x1000 DYNAMIC 0x000f20 0x08049f20 0x08049f20 0x000d0 0x000d0 RW 0x4 NOTE 0x000188 0x08048188 0x08048188 0x00020 0x00020 R 0x4 GNU_EH_FRAME 0x000500 0x08048500 0x08048500 0x00014 0x00014 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 GNU_RELRO 0x000f0c 0x08049f0c 0x08049f0c 0x000f4 0x000f4 R 0x1 PAX_FLAGS 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4 Section to Segment mapping: Segment Sections… 00 01 .interp 02 .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 04 .dynamic 05 .note.ABI-tag 06 .eh_frame_hdr 07 08 .ctors .dtors .jcr .dynamic .got 09 Dynamic section at offset 0xf20 contains 21 entries: Tag Type Name/Value 0x00000001 (NEEDED) Shared library: [libc.so.6] 0x0000000c (INIT) 0x80482d8 0x0000000d (FINI) 0x80484cc 0x00000004 (HASH) 0x80481a8 0x6ffffef5 (GNU_HASH) 0x80481d0 0x00000005 (STRTAB) 0x8048240 0x00000006 (SYMTAB) 0x80481f0 0x0000000a (STRSZ) 76 (bytes) 0x0000000b (SYMENT) 16 (bytes) 0x00000015 (DEBUG) 0x0 0x00000003 (PLTGOT) 0x8049ff4 0x00000002 (PLTRELSZ) 24 (bytes) 0x00000014 (PLTREL) REL 0x00000017 (JMPREL) 0x80482c0 0x00000011 (REL) 0x80482b8 0x00000012 (RELSZ) 8 (bytes) 0x00000013 (RELENT) 8 (bytes) 0x6ffffffe (VERNEED) 0x8048298 0x6fffffff (VERNEEDNUM) 1 0x6ffffff0 (VERSYM) 0x804828c 0x00000000 (NULL) 0x0 Relocation section ‘.rel.dyn’ at offset 0x2b8 contains 1 entries: Offset Info Type Sym.Value Sym. Name 08049ff0 00000206 R_386_GLOB_DAT 00000000 gmon_start Relocation section ‘.rel.plt’ at offset 0x2c0 contains 3 entries: Offset Info Type Sym.Value Sym. Name 0804a000 00000107 R_386_JUMP_SLOT 00000000 printf 0804a004 00000207 R_386_JUMP_SLOT 00000000 gmon_start 0804a008 00000307 R_386_JUMP_SLOT 00000000 __libc_start_main There are no unwind sections in this file. Symbol table ‘.dynsym’ contains 5 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.0 (2) 2: 00000000 0 NOTYPE WEAK DEFAULT UND gmon_start 3: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.0 (2) 4: 080484ec 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used Symbol table ‘.symtab’ contains 52 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 08048174 0 SECTION LOCAL DEFAULT 1 2: 08048188 0 SECTION LOCAL DEFAULT 2 3: 080481a8 0 SECTION LOCAL DEFAULT 3 4: 080481d0 0 SECTION LOCAL DEFAULT 4 5: 080481f0 0 SECTION LOCAL DEFAULT 5 6: 08048240 0 SECTION LOCAL DEFAULT 6 7: 0804828c 0 SECTION LOCAL DEFAULT 7 8: 08048298 0 SECTION LOCAL DEFAULT 8 9: 080482b8 0 SECTION LOCAL DEFAULT 9 10: 080482c0 0 SECTION LOCAL DEFAULT 10 11: 080482d8 0 SECTION LOCAL DEFAULT 11 12: 080482f0 0 SECTION LOCAL DEFAULT 12 13: 08048330 0 SECTION LOCAL DEFAULT 13 14: 080484cc 0 SECTION LOCAL DEFAULT 14 15: 080484e8 0 SECTION LOCAL DEFAULT 15 16: 08048500 0 SECTION LOCAL DEFAULT 16 17: 08048514 0 SECTION LOCAL DEFAULT 17 18: 08049f0c 0 SECTION LOCAL DEFAULT 18 19: 08049f14 0 SECTION LOCAL DEFAULT 19 20: 08049f1c 0 SECTION LOCAL DEFAULT 20 21: 08049f20 0 SECTION LOCAL DEFAULT 21 22: 08049ff0 0 SECTION LOCAL DEFAULT 22 23: 08049ff4 0 SECTION LOCAL DEFAULT 23 24: 0804a00c 0 SECTION LOCAL DEFAULT 24 25: 0804a014 0 SECTION LOCAL DEFAULT 25 26: 00000000 0 SECTION LOCAL DEFAULT 26 27: 00000000 0 FILE LOCAL DEFAULT ABS hello.c 28: 08049f0c 0 NOTYPE LOCAL DEFAULT 18 __init_array_end 29: 08049f20 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC 30: 08049f0c 0 NOTYPE LOCAL DEFAULT 18 __init_array_start 31: 08049ff4 0 OBJECT LOCAL DEFAULT 23 GLOBAL_OFFSET_TABLE 32: 08048490 5 FUNC GLOBAL DEFAULT 13 __libc_csu_fini 33: 08048495 0 FUNC GLOBAL HIDDEN 13 __i686.get_pc_thunk.bx 34: 0804a00c 0 NOTYPE WEAK DEFAULT 24 data_start 35: 00000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.0 36: 0804a014 0 NOTYPE GLOBAL DEFAULT ABS _edata 37: 080484cc 0 FUNC GLOBAL DEFAULT 14 _fini 38: 08049f18 0 OBJECT GLOBAL HIDDEN 19 DTOR_END 39: 0804a00c 0 NOTYPE GLOBAL DEFAULT 24 __data_start 40: 00000000 0 NOTYPE WEAK DEFAULT UND gmon_start 41: 0804a010 0 OBJECT GLOBAL HIDDEN 24 __dso_handle 42: 080484ec 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used 43: 00000000 0 FUNC GLOBAL DEFAULT UND _libc_start_main@@GLIBC 44: 08048430 90 FUNC GLOBAL DEFAULT 13 __libc_csu_init 45: 0804a01c 0 NOTYPE GLOBAL DEFAULT ABS _end 46: 08048330 0 FUNC GLOBAL DEFAULT 13 _start 47: 080484e8 4 OBJECT GLOBAL DEFAULT 15 _fp_hw 48: 0804a014 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 49: 08048404 29 FUNC GLOBAL DEFAULT 13 main 50: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 51: 080482d8 0 FUNC GLOBAL DEFAULT 11 _init Histogram for bucket list length (total of 3 buckets): Length Number % of total Coverage 0 0 ( 0.0%) 1 2 ( 66.7%) 50.0% 2 1 ( 33.3%) 100.0% Histogram for `.gnu.hash’ bucket list length (total of 2 buckets): Length Number % of total Coverage 0 1 ( 50.0%) 1 1 ( 50.0%) 100.0% Version symbols section ‘.gnu.version’ contains 5 entries: Addr: 000000000804828c Offset: 0x00028c Link: 5 (.dynsym) 000: 0 (local) 2 (GLIBC_2.0) 0 (local) 2 (GLIBC_2.0) 004: 1 (global) Version needs section ‘.gnu.version_r’ contains 1 entries: Addr: 0x0000000008048298 Offset: 0x000298 Link: 6 (.dynstr) 000000: Version: 1 File: libc.so.6 Cnt: 1 0x0010: Name: GLIBC_2.0 Flags: none Version: 2 Notes at offset 0x00000188 with length 0x00000020: Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 2.6.9 [/plain]

Conclusion

We’ve now seen how a simple program written in C is converted into the assembly code, the object file and finally the executable file. While in the C code, the program didn’t have any sections, it had two sections in assembly dialect: the .rodata and .text. When we compiled it into an object file and finally into the executable, the file had more and more sections that are needed for the program to be executed successfully.