TL;DR: this is an exercise in implementing a C compiler from scratch. "From scratch" here means "without an existing gcc/clang," so consider civilization destruction scenarios, aliens reading our source code, EMP strike that takes out all smart silicon, corporate policy won't let you download development tools, you only have a javascript console and gumption, etc...
To do this, you must:
1. Implement a small tool that turns hexidecimal into binary (you can do this in any language)
2. Use whatever you have (python, POSIX shell, alien crystal substrate, x86-64 machine code, ...) to implement a small VM that runs simple bytecode. The VM has 16 registers and 16MB of working memory. There are sixteen opcodes to implement for arithmetic, memory manipulation, and control flow. There are also twelve syscalls for fopen/fread/fwrite/unlink(!)/etc.
After these two steps (that you have to repeat yourself post-civilization collapse), everything's self-hosted:
3. Use the VM to write a manual linker that resolves labels
4. Use the linker to write assembler for a custom assembly language
5. Use the assembler to implement a minimal C compiler / preprocessor, that then compiles a more complex C compiler, that can compile a C17 compiler, that then compiles doom
See also: nand2tetris (focus is on teaching, less pragmatism), Cosmopolitan C (x64 as actually portable runtime)
Author here. I think my opinion would be about the same as the authors of the stage0 project [1]. They invested quite a bit of time trying to get Forth to work but ultimately abandoned it. Forth has been suggested often for bootstrapping a C compiler, and I hope someone does it someday, but so far no one has succeeded.
Programming for a stack machine is really hard, whereas programming for a register machine is comparatively easy. I designed the Onramp VM specifically to be easy to program in bytecode, while also being easy to implement in machine code. Onramp bootstraps through the same linker and assembly languages that are used in a traditional C compilation process so there are no detours into any other languages like Forth (or Scheme, which live-bootstrap does with mescc.)
tl;dr I'm not really convinced that Forth would simplify things, but I'd love to be proven wrong!
Forth is not a convenient VM to target for C compilers because of its numerous idiosyncrasies (e.g. the stacks don't neatly map to what a typical naive C implementation would expect).
> Security: Compiler binaries can contain malware and backdoors that insert viruses into programs they compile. Malicious code in a compiler can even recognize its own source code and propagate itself. Recompiling a compiler with itself therefore does not eliminate the threat. The only compiler that can truly be trusted is one that you've bootstrapped from scratch.
It is a laudable goal, but without using from-scratch hardware and either running the bootstrap on bare metal or on a from-scratch OS, I think "truly be trusted" isn't quite reachable with an approach that only handles user-space program execution.
Indeed! An eventual goal of Onramp is to bootstrap in freestanding so we can boot directly into the VM without an OS. This eliminates all binaries except for the firmware of the machine. The stage0/live-bootstrap team has already accomplished this so we know it's possible. Eliminating firmware is platform-dependent and mostly outside the scope of Onramp but it's certainly something I'd like to do as a related bootstrap project.
A modern UEFI is probably a million lines of code so there's a huge firmware trust surface there. One way to eliminate this would be to bootstrap on much simpler hardware. A rosco_m68k [1] is an example, one that has requires no third party firmware at all aside from the non-programmable microcode of the processor. (A Motorola 68010 is thousands of times slower than a modern processor so the bootstrap would take days, but that's fine, I can wait!)
Of course there's still the issue of trusting that the data isn't modified getting into the machine. For example you have to trust the tools you're using to flash EEPROM chips, or if you're using an SD card reader you have to trust its firmware. You also have to trust that your chips are legit, that the Motorola 68010 isn't a modern fake that emulates it while compromising it somehow. If you had the resources you'd probably want to x-ray the whole board at a minimum to make sure the chips are real. As for trusting ROM, I have some crazy ideas on how to get data into the machine in a trustable way, but I'm not quite ready to embarrass myself by saying them out loud yet :)
From the GitHub for on-ramp: it’s “self-bootstrapping and can compile itself from scratch”. What does that mean? How can it compile itself if it doesn’t exist?
smurpy|1 year ago
Adjacent (resilient, low-level, big-vision, auditable) projects include:
http://collapseos.org/ Forth OS, bootstrapable from paper, for z80
https://urbit.org/ standalone, distributed, auditable, provable, minimalist
https://justine.lol/ APE (actually portable executable); cosmopolitan libc
actionfromafar|1 year ago
gcr|1 year ago
To do this, you must:
1. Implement a small tool that turns hexidecimal into binary (you can do this in any language)
2. Use whatever you have (python, POSIX shell, alien crystal substrate, x86-64 machine code, ...) to implement a small VM that runs simple bytecode. The VM has 16 registers and 16MB of working memory. There are sixteen opcodes to implement for arithmetic, memory manipulation, and control flow. There are also twelve syscalls for fopen/fread/fwrite/unlink(!)/etc.
After these two steps (that you have to repeat yourself post-civilization collapse), everything's self-hosted:
3. Use the VM to write a manual linker that resolves labels
4. Use the linker to write assembler for a custom assembly language
5. Use the assembler to implement a minimal C compiler / preprocessor, that then compiles a more complex C compiler, that can compile a C17 compiler, that then compiles doom
See also: nand2tetris (focus is on teaching, less pragmatism), Cosmopolitan C (x64 as actually portable runtime)
basementcat|1 year ago
fuhsnn|1 year ago
ludocode|1 year ago
Programming for a stack machine is really hard, whereas programming for a register machine is comparatively easy. I designed the Onramp VM specifically to be easy to program in bytecode, while also being easy to implement in machine code. Onramp bootstraps through the same linker and assembly languages that are used in a traditional C compilation process so there are no detours into any other languages like Forth (or Scheme, which live-bootstrap does with mescc.)
tl;dr I'm not really convinced that Forth would simplify things, but I'd love to be proven wrong!
[1]: https://github.com/oriansj/stage0?tab=readme-ov-file#forth
int_19h|1 year ago
jjnoakes|1 year ago
It is a laudable goal, but without using from-scratch hardware and either running the bootstrap on bare metal or on a from-scratch OS, I think "truly be trusted" isn't quite reachable with an approach that only handles user-space program execution.
ludocode|1 year ago
A modern UEFI is probably a million lines of code so there's a huge firmware trust surface there. One way to eliminate this would be to bootstrap on much simpler hardware. A rosco_m68k [1] is an example, one that has requires no third party firmware at all aside from the non-programmable microcode of the processor. (A Motorola 68010 is thousands of times slower than a modern processor so the bootstrap would take days, but that's fine, I can wait!)
Of course there's still the issue of trusting that the data isn't modified getting into the machine. For example you have to trust the tools you're using to flash EEPROM chips, or if you're using an SD card reader you have to trust its firmware. You also have to trust that your chips are legit, that the Motorola 68010 isn't a modern fake that emulates it while compromising it somehow. If you had the resources you'd probably want to x-ray the whole board at a minimum to make sure the chips are real. As for trusting ROM, I have some crazy ideas on how to get data into the machine in a trustable way, but I'm not quite ready to embarrass myself by saying them out loud yet :)
[1]: https://rosco-m68k.com/
binarymax|1 year ago
Koshkin|1 year ago
in-pursuit|1 year ago
tekknolagi|1 year ago
purple-leafy|1 year ago
purple-leafy|1 year ago
nenadg|1 year ago