top | item 8974024

Show HN: A minimal C runtime for Linux i386 and x86_64 in 87 SLOC of C

172 points| oso2k | 11 years ago |github.com | reply

54 comments

order
[+] lunixbochs|11 years ago|reply
For x86_64 Linux only, here's a single-file crt0 (with arbitrary syscalls working from C-land): https://gist.github.com/lunixbochs/462ee21c3353c56b910f

Build with `gcc -std=c99 -ffreestanding -nostdlib`. After -Os and strip, a.out is 1232 bytes on my system. I got it to 640 bytes with `strip -R .eh_frame -R .eh_frame_hdr -R .comment a.out`.

Starting at ~640 bytes, maybe you could come close to asmutils' httpd in binary size. Failing that, take a look at [1]

You can get pretty far without a real libc, keeping in mind:

- You probably want fprintf, and things can be slower without buffered IO (due to syscall overhead)

- `mmap/munmap` is okay as a stand-in allocator, though it has more overhead for many small allocations.

- You don't get libm for math helper functions.

Of course, you can cherry-pick from diet or musl libc if you need individual functions.

[1] "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux" http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...

[+] oso2k|11 years ago|reply
Thanks for this Ryan. I'll reference this gist too. I remember starting with something like this early on (w/o your sysdef) but wanted argc, argv, envp and to separate syscall* from the startup code.

Part of the reason I've started this is that libc suffers from bad design (in von Leitner's definition [0]). I find c0's & djb's APIs better designed [1][2]. *printf is a prime example of bad design. Heck, pretty much everything in stdio.h & string.h has a design flaw)

[0] http://www.fefe.de/dietlibc/diet.pdf

[1] http://c0.typesafety.net/tutorial/Strings.html

[2] http://www.fefe.de/djb/

[+] acqq|11 years ago|reply
And the good thing for the statically linked executables is that the binary can work on every Linux as the syscall interfaces remain the same. The critical point is not depending on something else platform specific other than the syscalls, but for some kind of utilities its doable.
[+] kentonv|11 years ago|reply
I like this a lot more than I should. :)

Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces, and it makes me sad that I'm pulling in some 3MB of library code (libc + libm + pthread; not even counting the dynamically-loaded stuff like nsswitch) that I mostly don't want.

Sadly as a C++ programmer I do at least need libgcc to implement exceptions, which in turn likely pulls in glibc anyway. Sigh. (And I haven't completely cut ties with libstdc++ yet, though I'm close...)

(And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!)

[+] justin66|11 years ago|reply
> And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!

In principle, shouldn't avoiding those libraries make code friendlier to the instruction cache, even if the libraries are already in memory? (yes, I'm also trying to justify the inherent niftyness of this...)

[+] cbd1984|11 years ago|reply
> Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces

I'd rather not re-write fprintf, myself, but I suppose that's up to you. ;)

There are some good functions implemented in the kernel instead of libc, but I personally think fork() and pthread_create() are more intuitive than learning what exactly Linux expects from a clone() system call, even assuming it's fully-documented. (Which, to be fair, it likely is. This isn't Windows; the Linux kernel API is stable and meant for public consumption.)

My main point is that a kernel (as opposed to a VM) is an abstraction anyway, so I might as well pick a convenient abstraction, regardless of where the code to implement that abstraction happens to live.

[+] joelwilliamson|11 years ago|reply
If you use -ffreestanding -nostdlib, libgcc doesn't pull in libc. You'll need to write the syscall(2) function yourself.
[+] userbinator|11 years ago|reply
Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces

I agree, and I think it has to do with the fact that parameters are always passed in registers, and error returns are quite straightforward (negative of the error number) as opposed to the C interface convention of returning only -1 and putting the error number in errno.

[+] frozenport|11 years ago|reply
>>I'm pulling in some 3MB of library code that I mostly don't want.

Doesn't LTO fix that?

[+] rian|11 years ago|reply
00_start.c is too hacked on x86_64. it'll work but you're getting a less efficient binary since gcc has to assume _start is called like a normal C function (e.g. it creates a preamble). you should just implement it in assembly.

__init() itself also needs some work. the argument list is weird, linux pushes all of argc, argv, and environ on the stack. why special case argc? also your method of deriving argv and environ from the function argument's address is extremely brittle, and i don't think it actually works on x86_64 (if it does, that's really lucky). you aren't calculating envp using argc, so it's probably wrong. you could get more efficient code from using __attribute__((noreturn)). this would be better:

    /* called from _start */
    void __init(void *initial_stack) __attribute__((noreturn));    
    void __init(void *initial_stack) {
        int argc = *(int *) initial_stack;
        char **argv = ((char **) initial_stack) + 1;
        /* assert(!argv[argc]); */
        char **envp = __environ = argv + argc + 1;

        _exit(main(argc, argv, envp));
    }
[+] oso2k|11 years ago|reply
What do you mean 00_start.c is too hacked on x86_64? As for efficiency of _start, `objdump -d lib/00_start.o` yields

``` 0000000000000000 <_start>: 0: 48 89 e5 mov %rsp,%rbp 3: 48 8b 3c 24 mov (%rsp),%rdi 7: 48 8b 74 24 08 mov 0x8(%rsp),%rsi c: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 10: e8 00 00 00 00 callq 15 <_start+0x15> 15: c3 retq ```

I only see one byte of inefficiency, the `retq`. Otherwise, it's exactly as I've specified it.

What I found is (by using gdb) is that the stack contains, in order , `argc` `[RSP+0]`, `argv` `[RSP+8]`, & `envp` `[RSP+16]`. I verified this using 'frame' in gdb using the source (RSP) and dest addresses. Honestly, I was surprised since it matched exactly what was presented to the ELF image on i386.

Most of the libc's I've surveyed did something like you've specified for __init. However, gcc generated different code for -O3 & -Os, often breaking one or the other optimization args, by modifying what was stored/pointed to for envp and/or *argv. While argc, argv, envp, and envpc are soecified tge

[+] mmastrac|11 years ago|reply
Nice! As part of some work that I was doing ages ago, I had to build myself a custom libc to statically link executables that would run on Android and WebOS since they are both essentially ARM Linux under the hood.

You can learn a lot by writing yourself a libc. Even building a simple/stupid malloc from scratch is a learning exercise.

[+] hyc_symas|11 years ago|reply
Have a look at the libc we wrote for Atari ST TOS/MiNT. Nice and small, but still mostly POSIX compliant. Nearly 30 years ago now.
[+] oso2k|11 years ago|reply
That is very cool. That source public anywhere?
[+] chappar|11 years ago|reply
I am not familiar with embedded asm. Can someone explan what the following line does?

"register long r10 __asm__( "r10" ) = a3"

[+] tjgq|11 years ago|reply
It declares a variable named r10 and instructs the compiler to store it in the r10 CPU register. It's a GCC extension; the farthest you can get in standards-compliant C is

    register long r10 = a3;
but the register keyword is advisory only (the compiler is free to ignore it) and you cannot specify the exact register you want to be used.

Reference: https://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html

[+] pc2g4d|11 years ago|reply
What does "SLOC" stand for in this context?
[+] NhanH|11 years ago|reply
Source lines of code
[+] pjmlp|11 years ago|reply
Why just not do

gcc -static -Os -fdata-sections -ffunction-sections -Wl, --gc-sections ...

or similar, in the compiler of choice?

[+] brudgers|11 years ago|reply
I'll admit my ignorance down at this level. Can someone explain what does and how it can be used?
[+] ChuckMcM|11 years ago|reply
libc is C's "standard" library. It has a lot of stuff in it that some programs, especially small single purpose ones, don't need. So when a very simple program is linked into an executable, it has a bunch of extra stuff brought along for the ride. If you write helloworld.c (the canonical 4 line program in C) and link it on Ubuntu 14.04LTS it is 6240 bytes stripped, 8511 bytes unstripped. With this version of libc you can make it 1/5th that size.

Not a huge deal on large systems with giant disks and memory but a lot of ARM linux users are rediscovering the joys of small binaries, especially on limited space eMMC storage for the kernel and all the programs and libraries.

[+] jestinjoy1|11 years ago|reply
For the uninitiated what does this program do and what is its importance?
[+] cnvogel|11 years ago|reply
Attempt at an explanation:

To get a C-program to run, even an empty "int main(){ return 0;}" needs some supplemental code to set things up the way your main() function expects.

This supplemental code is what the github repository provides. You can then write very small, statically linked, programs without pulling in most of the C-library and other conveniences, and you will only be able to use the "raw" interfaces your kernel provides.

E.g. you'll have first to ponder what the "write()" system-call actually is, then use write(1,"Hello\n",6); instead of the wrappers and convenience of the C-library such as printf() which of course additionally gives you formatting, buffered I/O, ...).