Statically Recompiling NES Games into Native Executables with LLVM and Go (2013)

[+] ZenPsycho|10 years ago|reply

I have actually been rather obsessed with this lately. You'll see that there's a few problems he runs into that he deems insurmountable, which sounds like a challenge! Specifically, there's the issue with an instruction which is effectively a computed goto:

A jump instruction that takes an address and then jumps to the address STORED at that address. Since there is no way to know at compile time what addresses are going to stored at a place, you're forced to then dynamically emulate the whole memory space of the actual NES to accurately calculate it, thus defeating the whole point.

Is that the only solution though? a head scratcher!

Further are issues with parts of code that, on some level seems to be taking inspiration from genetics: Jump to one alignment, and the instructions get interpreted one way, jump to a different alignment and the same sequence of bytes is interpreted by the CPU as an entirely different set of instructions. I wonder if that could be resolved by creating a different source code path for each alignment using flow analysis- A space saving technique effectively getting uncompressed.

[+] stormbrew|10 years ago|reply

I suppose it's possible this could be considered "emulating the whole memory space," though I wouldn't consider it that, but you could just generate an offset table for all jumpable memory locations and use them to calculate the correct offset. It's quite likely you could even do that in less than O(n) space with some time trade-offs.

[+] gregpardo|10 years ago|reply

Back in the romhacking scene we had emulators that the longer you would play through the more they could map out the entire assembly/data. I also had a friend who wrote a disassembler that used this information as well as some tasty algorithms to get a complete disassembly of SNES games.

[+] DannyBee|10 years ago|reply

" Since there is no way to know at compile time what addresses are going to stored at a place"

Why? This sounds like symbolic evaluation. You certainly can't know in all cases. But at worst, you can come up with the set of possible jump targets.

[+] gulpahum|10 years ago|reply

One major problem are games which generate code into ram and then execute it. I can't remember if there were any NES games doing that, but I've seen other 6502 based games doing that.

[+] ris|10 years ago|reply

"Since there is no way to know at compile time what addresses are going to stored at a place"

Well, this is exactly what LLVM's (admittedly limited) mem2reg pass is for.

[+] webkike|10 years ago|reply

Well clearly it is not impossible to recompile the program, perhaps by hand, into a different instruction set. Arguably this may be considered source to source translation. Sure it is hard, and there's no program that will EVER be able to do it automatically I assume. But that does not leave out the possibility of hand translation, which may prove an effective means by super skilled programmers of the future.

[+] onnoonno|10 years ago|reply

Yes, it looks like the real problems occur (as also mentioned in his post) whenever encountering self-modifying code.

I wonder whether it would be possible to detect and then either pattern match or manually resolve those cases of self-modifying code, in case they are few and contained to a small section of code each?

[+] orik|10 years ago|reply

here's the previous discussion: https://news.ycombinator.com/item?id=5838326

(i've realized; is this even necessary with the 'past' button?)

[+] Splines|10 years ago|reply

I never knew there was a "past" button...

[+] hias|10 years ago|reply

Never seen that button. Speaks for the UI designer, that a comment about a function is more visible than the function itself ;-) Just kidding, I am a fault for not looking right!

[+] SixSigma|10 years ago|reply

I should really be a bit more visible

[+] madez|10 years ago|reply

I can’t help it. Reading this feels a bit like hidden political propaganda. It’s ridden with subtle and not so subtle negative references to gcc, the fsf and ideals.

Probably it’s me reading too much into it but it makes it hard to enjoy.

[+] tibbon|10 years ago|reply

Does anyone have information with how some of the "multi cart" games worked, like Mario/Duck Hunt/Track Meet? Surely, all 3 games didn't fit in 32k right?... right?

http://nintendo.wikia.com/wiki/3-in-1_Super_Mario_Bros._/_Du...

[+] jimsmart|10 years ago|reply

An educated guess (having coded the NES) is that it's simply a bigger ROM, and the cart contained some kind of MMC chip [0] which allows different sections from the ROM to be paged-in - the menu screen contains the code to page-in the applicable bank from the ROM, and then the game runs per normal, not being aware of any of this.

[0] https://en.wikipedia.org/wiki/Memory_management_controller

[+] andrewvijay|10 years ago|reply

Too low level for me. But since that I've started go as my first low level statically typed language, I think I'll just bookmark this article and may be read after a few years!

[+] spriggan3|10 years ago|reply

> Too low level for me. But since that I've started go as my first low level statically typed language

I don't think any garbage collected language can be called "low level". If you really want to go low level, learn C and ASM. Manual memory management is the real deal.

[+] unknown|10 years ago|reply

[deleted]

[+] gcc_programmer|10 years ago|reply

gcc -Wall -fno-diagnostics-color

If you are going to list architectures supported, also list the ones gcc supports - it's, basically, all of them.

Llvm+clang is nice, but not better, drink the kool aid. Gcc performance is still higher, and gcc is free software.

[+] techdragon|10 years ago|reply

Clang is also free... Just not your preferred definition of Free.

64 comments