top | item 30575469

ELF: From the Programmer's Perspective

18 points| rramadass | 4 years ago |beefchunk.com

5 comments

[+] bediger4000|4 years ago|reply

Why is ELF so complicated? The Linux kernel only uses a very small part of the ELF header to read a file into memory. ld.so only uses a little bit more.

I guess this raises an even bigger question: why are modern executable formats so complicated? MacOS' Mach-O is at least as complicated as ELF. I haven't really looked at PE headers, but I did look at COFF headers back in the day, and they kind of made sense.

The famous ELF spec really doesn't do much to clear up sheaders vs pheaders and all the fields of a pheader.

[+] rramadass|4 years ago|reply

You might find Brian Raiter's write-up of the ELF Format useful : https://www.muppetlabs.com/~breadbox/software/ELF.txt (PDF available at http://www.skyfree.org/linux/references/ELF_Format.pdf)

[+] mierle|4 years ago|reply

The key to understanding ELF is that it serves three purposes:

(1) Execution - A container describing how an OS can load and execute a binary

(2) Linking - A container with relocatable machine code; for the linker to assemble

(3) Metadata for debugging and other purposes like stack traces

Program headers describe "segments". Segments are for the execution time view of the ELF. They describe the parts of the file to load into memory regions, and the execution entry address. The program headers are what are read by the OS when running "./my_executable". The segments don't always have names. Also, it is confusing that "sections" and "segments" mean very different things, but that's just how it is.

Thus, program headers (and the segments they describe) are for #1 - Execution.

Sections are a link-time construct, leveraged by the linker, to decide how to allocate functions and data to the execution segments (via sections). Intermediate ELF object files contain relocatable code, where the code is allocated to sections (you can manually specify them if you want). Example sections include ".text", which is executable code; ".bss" which is zero-initialized static variables; ".data" which is pre-initialized static variables, and so on. You can see an example of sections being allocated to segments in this thoroughly described linker script [1]. Sections are mostly ignored during program execution.

Thus, section headers (and the sections they describe) are for #2 - Linking.

You aren't alone in not knowing these details about ELF. It wasn't until I got into the embedded space that I dug deep to understand linking and loading, linker scripts, and executable formats. These details are important to understand for microcontrollers since you may need to carefully allocate code to physical addresses that have faster memory (e.g. core coupled RAM), or put code in flash. In some cases you execute code directly out of flash (so must tell the linker that including physical addresses); but in other cases, you might need to load code into RAM (manually, there is no OS!) from flash since executing out of flash can be slow. On desktops and servers, it's rare to change (or even know about) linker scripts.

[1] https://blog.thea.codes/the-most-thoroughly-commented-linker...

[+] ugl|4 years ago|reply

this is from 1995.