PS: Holy crap! For the first time in my 40+ year career I have clicked-thru from a semi-relevant article about Rust on a micro p̵r̵o̵c̵e̵s̵s̵o̵r̵ controller to a reference about the...[RCA] COSMAC VIP (in the form of this dude's effort to get CHIP-8 running on LLVM-MOS). Do you have any idea how many lawns I had to mow to buy one of those? It was a big disappointment (over my ELF and SuperELF) too! ROFL
Together with https://github.com/nrf-rs/nrf-hal these enable most everything one can do on these controllers form pure Rust (the softdevice is a blob with a C-SDK that's wrapped in rust though)
That is so cool. I saw some posts about LLVM-MOS a while ago, but at that point I thought it would be just another in a fairly long list of attempts to try and get LLVM to output 6502 instructions.
I never expected it to come together this well! Especially considering that the author of the article mentions there were so many issues with LLVM-AVR, you'd expect them to exist in LLVM-MOS as well. Apparently not! I guess the code quality will only improve from here on out, the loop at the bottom of the article does seem like it is not as optimal as it could be :)
Up until just a few weeks ago, 100% of the codegen work we've put into LLVM-MOS has been to get it feature-complete and rock-solid. It's awesome to see that that work has paid off!
We're just now starting to really optimize the compiler; there's definitely a long road ahead of us, but our preliminary investigations suggest that we'll be able to get the thing to emit really quite good 6502 assembly.
Right now, it emits near-garbage in a large number of common cases, as seen in the article. This is mostly due to technical debt intentionally accrued while getting the thing working, though; we did stuff like use the default LLVM lowering for comparisons, which are ridiculously trash on the 6502. But there's only really a couple major technical hurdles left to overcome; everything else is just painstakingly teaching LLVM what the best 6502 assembly patterns are for various situations.
I haven't looked at this closely, but 6502 really doesn't lend itself to C compilation. Three registers, only one of which works with the ALU, awkward immovable stack, etc.
The 65816 is a better target (moveable direct page and stack and some wider registers), but also awkward with its register mode switching.
Author of mentioned post on 6502.org forum here. In the meantime I worked a bit on implementing proper rust target-triple for 6502 (mos-unknown-none), code is here: https://github.com/mrk-its/rust/tree/mos_target
That's cool! I wanted to avoid having to build Rust and/or LLVM from source myself, hence the somewhat awkward "tell Cargo we're on default target, let Clang sort it out at link time" setup.
I am not sure if it is a good idea to compile code targeted to modern processors to 8-bit CPUs like 6502. For example:
Languages like C (or Rust) allocate variables on the stack because it is cheap with modern CPUs, but 8-bit CPUs don't have addressing modes to access them easily. (by the way, some modern CPUs like ARM also cannot add a register to a variable on the stack).
The solution is not to use the stack for variables and instead use zero-page locations. As there are only 256 zero-page bytes, same locations should be reused for variables in different functions. This cannot be used with recursive functions, but such code is ineffecient anyway so it is better not to use them at all and use loops instead.
Another thing is heap and closures (that allocate variables on the heap). Instead of heap the code for 8-bit CPUs should use static allocation.
The article contains an example of 6502 code compiled from Rust and this code is inefficient. It uses too much locations for variables (rc6-rc39) and it wastes time saving and restoring those locations in prologue/epilogue.
No wonder that programs run slowly. It would be much better to compile CHIP-8 directly to 6502 assembly.
Most of the inoptimality in the article isn't due to the issues you've raised, but rather due to us just starting to optimize LLVM-MOS.
First, I have utterly no idea why there are so many calls to memset; it looks like it's unrolling a loop or something... poorly. It also doesn't seem to be reusing registers when setting up the calls; that's also bad and should be fixed.
Second, if you take a look at the actual structure of the prologue and epilogue, you might notice that it's copying zero page to an absolute memory region called __clear_screen_sstk. This is because LLVM-MOS ran a whole-program analysis on the program and proved that at most one activation of that function could occur at any given time. Thus, it's "stack frame" was automatically allocated statically as a global array, not relative to a moving stack pointer.
The reason that the prologue and epilogue spends so much time copying in and out of the zero page is just that we haven't taught LLVM-MOS how to access the stack directly, but there's no technical obstacle to doing so. Once that's done, the whole body of the function would operate on __clear_screen_sstk directly, and the prologue and epilogue would disappear completely.
Of course, from the first point, you shouldn't need any stack locations to do the body of this routine; there's a big ball of yarn here, but pulling on any of a number of threads would unravel it.
Strange exercise because Rust and the 6502 original programming mood are totally different: a word of cleverness and the most obscure side effects in order to squeeze the last clock cycle. But everything is "hack value", I will respect.
I don't think you can get past that the 6502 was meant to be programmed in assembly. Some of the tricks needed to optimally use memory just don't lend themselves to higher level languages. I started with a lot of basic and then moved to assembler because it was the easiest path.
Er... the article doesn't make it clear, but I guess we're talking about cross-compilation here? So it's not "Rust" (or, as he writes later, LLVM) running on the 6502, just the code generated by the Rust compiler.
Don’t most people generally mean the target binary from the compiler and not the compiler itself when someone says “see * running on this architecture”?
I can see for some dynamic languages there being a destination between the two, but for compiled binaries, generally Rust on X, it doesn’t seem important if rustc also runs on X (especially when discussing micro-controllers since one would rarely run a full compiler on the chip itself).
Did you look at chirp8-engine, or only chirp8-c64?
The value add is not in the parts that interface with the C64 internals; probably using C for that would make for nicer code. But I wanted to push as much into Rust as I could in the short amount of time I spent on this.
vaxman|4 years ago
PS: Holy crap! For the first time in my 40+ year career I have clicked-thru from a semi-relevant article about Rust on a micro p̵r̵o̵c̵e̵s̵s̵o̵r̵ controller to a reference about the...[RCA] COSMAC VIP (in the form of this dude's effort to get CHIP-8 running on LLVM-MOS). Do you have any idea how many lawns I had to mow to buy one of those? It was a big disappointment (over my ELF and SuperELF) too! ROFL
[ https://youtu.be/fLVN05Jl6wA ]
zwirbl|4 years ago
https://github.com/embassy-rs/embassy https://github.com/embassy-rs/nrf-softdevice
Together with https://github.com/nrf-rs/nrf-hal these enable most everything one can do on these controllers form pure Rust (the softdevice is a blob with a C-SDK that's wrapped in rust though)
royjacobs|4 years ago
I never expected it to come together this well! Especially considering that the author of the article mentions there were so many issues with LLVM-AVR, you'd expect them to exist in LLVM-MOS as well. Apparently not! I guess the code quality will only improve from here on out, the loop at the bottom of the article does seem like it is not as optimal as it could be :)
mysterymath|4 years ago
We're just now starting to really optimize the compiler; there's definitely a long road ahead of us, but our preliminary investigations suggest that we'll be able to get the thing to emit really quite good 6502 assembly.
Right now, it emits near-garbage in a large number of common cases, as seen in the article. This is mostly due to technical debt intentionally accrued while getting the thing working, though; we did stuff like use the default LLVM lowering for comparisons, which are ridiculously trash on the 6502. But there's only really a couple major technical hurdles left to overcome; everything else is just painstakingly teaching LLVM what the best 6502 assembly patterns are for various situations.
cmrdporcupine|4 years ago
The 65816 is a better target (moveable direct page and stack and some wider registers), but also awkward with its register mode switching.
emrk|4 years ago
Then standard cargo tool may be used to directly build 6502 executable, some examples: https://github.com/mrk-its/a800-rust-test or https://github.com/mrk-its/llvm-mos-ferris-demo
gergoerdi|4 years ago
codedokode|4 years ago
Languages like C (or Rust) allocate variables on the stack because it is cheap with modern CPUs, but 8-bit CPUs don't have addressing modes to access them easily. (by the way, some modern CPUs like ARM also cannot add a register to a variable on the stack).
The solution is not to use the stack for variables and instead use zero-page locations. As there are only 256 zero-page bytes, same locations should be reused for variables in different functions. This cannot be used with recursive functions, but such code is ineffecient anyway so it is better not to use them at all and use loops instead.
Another thing is heap and closures (that allocate variables on the heap). Instead of heap the code for 8-bit CPUs should use static allocation.
The article contains an example of 6502 code compiled from Rust and this code is inefficient. It uses too much locations for variables (rc6-rc39) and it wastes time saving and restoring those locations in prologue/epilogue.
No wonder that programs run slowly. It would be much better to compile CHIP-8 directly to 6502 assembly.
mysterymath|4 years ago
First, I have utterly no idea why there are so many calls to memset; it looks like it's unrolling a loop or something... poorly. It also doesn't seem to be reusing registers when setting up the calls; that's also bad and should be fixed.
Second, if you take a look at the actual structure of the prologue and epilogue, you might notice that it's copying zero page to an absolute memory region called __clear_screen_sstk. This is because LLVM-MOS ran a whole-program analysis on the program and proved that at most one activation of that function could occur at any given time. Thus, it's "stack frame" was automatically allocated statically as a global array, not relative to a moving stack pointer.
The reason that the prologue and epilogue spends so much time copying in and out of the zero page is just that we haven't taught LLVM-MOS how to access the stack directly, but there's no technical obstacle to doing so. Once that's done, the whole body of the function would operate on __clear_screen_sstk directly, and the prologue and epilogue would disappear completely.
Of course, from the first point, you shouldn't need any stack locations to do the body of this routine; there's a big ball of yarn here, but pulling on any of a number of threads would unravel it.
antirez|4 years ago
person22|4 years ago
rob74|4 years ago
Still cool though!
bluejekyll|4 years ago
I can see for some dynamic languages there being a destination between the two, but for compiled binaries, generally Rust on X, it doesn’t seem important if rustc also runs on X (especially when discussing micro-controllers since one would rarely run a full compiler on the chip itself).
ww520|4 years ago
fallat|4 years ago
gergoerdi|4 years ago
The real advantage of using Rust is in the actual program logic. E.g. the instructions are decoded into an algebraic datatype (in https://github.com/gergoerdi/chirp8-engine/blob/7623353a8bf0...) and then that is consumed in the virtual CPU (https://github.com/gergoerdi/chirp8-engine/blob/7623353a8bf0...). Rust's case-of-case optimization takes care of avoiding the intermediate data representation at runtime.
boomlinde|4 years ago
> It is worth pointing out that the amazing thing about chirp8-c64 is not how well it works, but that it works at all.