Build with `gcc -std=c99 -ffreestanding -nostdlib`. After -Os and strip, a.out is 1232 bytes on my system. I got it to 640 bytes with `strip -R .eh_frame -R .eh_frame_hdr -R .comment a.out`.
Starting at ~640 bytes, maybe you could come close to asmutils' httpd in binary size. Failing that, take a look at [1]
You can get pretty far without a real libc, keeping in mind:
- You probably want fprintf, and things can be slower without buffered IO (due to syscall overhead)
- `mmap/munmap` is okay as a stand-in allocator, though it has more overhead for many small allocations.
- You don't get libm for math helper functions.
Of course, you can cherry-pick from diet or musl libc if you need individual functions.
Thanks for this Ryan. I'll reference this gist too. I remember starting with something like this early on (w/o your sysdef) but wanted argc, argv, envp and to separate syscall* from the startup code.
Part of the reason I've started this is that libc suffers from bad design (in von Leitner's definition [0]). I find c0's & djb's APIs better designed [1][2]. *printf is a prime example of bad design. Heck, pretty much everything in stdio.h & string.h has a design flaw)
And the good thing for the statically linked executables is that the binary can work on every Linux as the syscall interfaces remain the same. The critical point is not depending on something else platform specific other than the syscalls, but for some kind of utilities its doable.
Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces, and it makes me sad that I'm pulling in some 3MB of library code (libc + libm + pthread; not even counting the dynamically-loaded stuff like nsswitch) that I mostly don't want.
Sadly as a C++ programmer I do at least need libgcc to implement exceptions, which in turn likely pulls in glibc anyway. Sigh. (And I haven't completely cut ties with libstdc++ yet, though I'm close...)
(And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!)
> And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!
In principle, shouldn't avoiding those libraries make code friendlier to the instruction cache, even if the libraries are already in memory? (yes, I'm also trying to justify the inherent niftyness of this...)
> Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces
I'd rather not re-write fprintf, myself, but I suppose that's up to you. ;)
There are some good functions implemented in the kernel instead of libc, but I personally think fork() and pthread_create() are more intuitive than learning what exactly Linux expects from a clone() system call, even assuming it's fully-documented. (Which, to be fair, it likely is. This isn't Windows; the Linux kernel API is stable and meant for public consumption.)
My main point is that a kernel (as opposed to a VM) is an abstraction anyway, so I might as well pick a convenient abstraction, regardless of where the code to implement that abstraction happens to live.
Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces
I agree, and I think it has to do with the fact that parameters are always passed in registers, and error returns are quite straightforward (negative of the error number) as opposed to the C interface convention of returning only -1 and putting the error number in errno.
00_start.c is too hacked on x86_64. it'll work but you're getting a less efficient binary since gcc has to assume _start is called like a normal C function (e.g. it creates a preamble). you should just implement it in assembly.
__init() itself also needs some work. the argument list is weird, linux pushes all of argc, argv, and environ on the stack. why special case argc? also your method of deriving argv and environ from the function argument's address is extremely brittle, and i don't think it actually works on x86_64 (if it does, that's really lucky). you aren't calculating envp using argc, so it's probably wrong. you could get more efficient code from using __attribute__((noreturn)). this would be better:
I only see one byte of inefficiency, the `retq`. Otherwise, it's exactly as I've specified it.
What I found is (by using gdb) is that the stack contains, in order , `argc` `[RSP+0]`, `argv` `[RSP+8]`, & `envp` `[RSP+16]`. I verified this using 'frame' in gdb using the source (RSP) and dest addresses. Honestly, I was surprised since it matched exactly what was presented to the ELF image on i386.
Most of the libc's I've surveyed did something like you've specified for __init. However, gcc generated different code for -O3 & -Os, often breaking one or the other optimization args, by modifying what was stored/pointed to for envp and/or *argv. While argc, argv, envp, and envpc are soecified tge
Nice! As part of some work that I was doing ages ago, I had to build myself a custom libc to statically link executables that would run on Android and WebOS since they are both essentially ARM Linux under the hood.
You can learn a lot by writing yourself a libc. Even building a simple/stupid malloc from scratch is a learning exercise.
Somewhat related: a minimal OS that just prints "Hello world" to the screen https://github.com/olalonde/minios (interesting code is in kmain.c and loader.s). Wrote it while going through http://littleosbook.github.io/ (which is great by the way if you are interested in learning a bit about OS development).
It declares a variable named r10 and instructs the compiler to store it in the r10 CPU register. It's a GCC extension; the farthest you can get in standards-compliant C is
register long r10 = a3;
but the register keyword is advisory only (the compiler is free to ignore it) and you cannot specify the exact register you want to be used.
libc is C's "standard" library. It has a lot of stuff in it that some programs, especially small single purpose ones, don't need. So when a very simple program is linked into an executable, it has a bunch of extra stuff brought along for the ride. If you write helloworld.c (the canonical 4 line program in C) and link it on Ubuntu 14.04LTS it is 6240 bytes stripped, 8511 bytes unstripped. With this version of libc you can make it 1/5th that size.
Not a huge deal on large systems with giant disks and memory but a lot of ARM linux users are rediscovering the joys of small binaries, especially on limited space eMMC storage for the kernel and all the programs and libraries.
To get a C-program to run, even an empty "int main(){ return 0;}" needs some supplemental code to set things up the way your main() function expects.
This supplemental code is what the github repository provides. You can then write very small, statically linked, programs without pulling in most of the C-library and other conveniences, and you will only be able to use the "raw" interfaces your kernel provides.
E.g. you'll have first to ponder what the "write()" system-call actually is, then use write(1,"Hello\n",6); instead of the wrappers and convenience of the C-library such as printf() which of course additionally gives you formatting, buffered I/O, ...).
[+] [-] lunixbochs|11 years ago|reply
Build with `gcc -std=c99 -ffreestanding -nostdlib`. After -Os and strip, a.out is 1232 bytes on my system. I got it to 640 bytes with `strip -R .eh_frame -R .eh_frame_hdr -R .comment a.out`.
Starting at ~640 bytes, maybe you could come close to asmutils' httpd in binary size. Failing that, take a look at [1]
You can get pretty far without a real libc, keeping in mind:
- You probably want fprintf, and things can be slower without buffered IO (due to syscall overhead)
- `mmap/munmap` is okay as a stand-in allocator, though it has more overhead for many small allocations.
- You don't get libm for math helper functions.
Of course, you can cherry-pick from diet or musl libc if you need individual functions.
[1] "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux" http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...
[+] [-] oso2k|11 years ago|reply
Part of the reason I've started this is that libc suffers from bad design (in von Leitner's definition [0]). I find c0's & djb's APIs better designed [1][2]. *printf is a prime example of bad design. Heck, pretty much everything in stdio.h & string.h has a design flaw)
[0] http://www.fefe.de/dietlibc/diet.pdf
[1] http://c0.typesafety.net/tutorial/Strings.html
[2] http://www.fefe.de/djb/
[+] [-] acqq|11 years ago|reply
[+] [-] kentonv|11 years ago|reply
Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces, and it makes me sad that I'm pulling in some 3MB of library code (libc + libm + pthread; not even counting the dynamically-loaded stuff like nsswitch) that I mostly don't want.
Sadly as a C++ programmer I do at least need libgcc to implement exceptions, which in turn likely pulls in glibc anyway. Sigh. (And I haven't completely cut ties with libstdc++ yet, though I'm close...)
(And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!)
[+] [-] justin66|11 years ago|reply
In principle, shouldn't avoiding those libraries make code friendlier to the instruction cache, even if the libraries are already in memory? (yes, I'm also trying to justify the inherent niftyness of this...)
[+] [-] cbd1984|11 years ago|reply
I'd rather not re-write fprintf, myself, but I suppose that's up to you. ;)
There are some good functions implemented in the kernel instead of libc, but I personally think fork() and pthread_create() are more intuitive than learning what exactly Linux expects from a clone() system call, even assuming it's fully-documented. (Which, to be fair, it likely is. This isn't Windows; the Linux kernel API is stable and meant for public consumption.)
My main point is that a kernel (as opposed to a VM) is an abstraction anyway, so I might as well pick a convenient abstraction, regardless of where the code to implement that abstraction happens to live.
[+] [-] joelwilliamson|11 years ago|reply
[+] [-] userbinator|11 years ago|reply
I agree, and I think it has to do with the fact that parameters are always passed in registers, and error returns are quite straightforward (negative of the error number) as opposed to the C interface convention of returning only -1 and putting the error number in errno.
[+] [-] frozenport|11 years ago|reply
Doesn't LTO fix that?
[+] [-] rian|11 years ago|reply
__init() itself also needs some work. the argument list is weird, linux pushes all of argc, argv, and environ on the stack. why special case argc? also your method of deriving argv and environ from the function argument's address is extremely brittle, and i don't think it actually works on x86_64 (if it does, that's really lucky). you aren't calculating envp using argc, so it's probably wrong. you could get more efficient code from using __attribute__((noreturn)). this would be better:
[+] [-] oso2k|11 years ago|reply
``` 0000000000000000 <_start>: 0: 48 89 e5 mov %rsp,%rbp 3: 48 8b 3c 24 mov (%rsp),%rdi 7: 48 8b 74 24 08 mov 0x8(%rsp),%rsi c: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 10: e8 00 00 00 00 callq 15 <_start+0x15> 15: c3 retq ```
I only see one byte of inefficiency, the `retq`. Otherwise, it's exactly as I've specified it.
What I found is (by using gdb) is that the stack contains, in order , `argc` `[RSP+0]`, `argv` `[RSP+8]`, & `envp` `[RSP+16]`. I verified this using 'frame' in gdb using the source (RSP) and dest addresses. Honestly, I was surprised since it matched exactly what was presented to the ELF image on i386.
Most of the libc's I've surveyed did something like you've specified for __init. However, gcc generated different code for -O3 & -Os, often breaking one or the other optimization args, by modifying what was stored/pointed to for envp and/or *argv. While argc, argv, envp, and envpc are soecified tge
[+] [-] mmastrac|11 years ago|reply
You can learn a lot by writing yourself a libc. Even building a simple/stupid malloc from scratch is a learning exercise.
[+] [-] jeffreyrogers|11 years ago|reply
[+] [-] hyc_symas|11 years ago|reply
[+] [-] oso2k|11 years ago|reply
[+] [-] oso2k|11 years ago|reply
[0] http://asm.sourceforge.net/asmutils.html
[1] https://github.com/leto/asmutils/blob/master/src/httpd.asm
[+] [-] __gcmurphy|11 years ago|reply
(my crappy code - https://bitbucket.org/gcmurphy/libc/)
[+] [-] olalonde|11 years ago|reply
[+] [-] chappar|11 years ago|reply
"register long r10 __asm__( "r10" ) = a3"
[+] [-] tjgq|11 years ago|reply
Reference: https://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html
[+] [-] pc2g4d|11 years ago|reply
[+] [-] NhanH|11 years ago|reply
[+] [-] pjmlp|11 years ago|reply
gcc -static -Os -fdata-sections -ffunction-sections -Wl, --gc-sections ...
or similar, in the compiler of choice?
[+] [-] brudgers|11 years ago|reply
[+] [-] ChuckMcM|11 years ago|reply
Not a huge deal on large systems with giant disks and memory but a lot of ARM linux users are rediscovering the joys of small binaries, especially on limited space eMMC storage for the kernel and all the programs and libraries.
[+] [-] 101914|11 years ago|reply
[+] [-] DougMerritt|11 years ago|reply
http://en.wikipedia.org/wiki/COFF#History
(ELF replaced COFF)
[+] [-] jestinjoy1|11 years ago|reply
[+] [-] cnvogel|11 years ago|reply
To get a C-program to run, even an empty "int main(){ return 0;}" needs some supplemental code to set things up the way your main() function expects.
This supplemental code is what the github repository provides. You can then write very small, statically linked, programs without pulling in most of the C-library and other conveniences, and you will only be able to use the "raw" interfaces your kernel provides.
E.g. you'll have first to ponder what the "write()" system-call actually is, then use write(1,"Hello\n",6); instead of the wrappers and convenience of the C-library such as printf() which of course additionally gives you formatting, buffered I/O, ...).