top | item 47163897

Show HN: Codex builds a working NES Emulator in one hour

6 points| zi2zi-jit | 4 days ago |github.com

Hi folks! I know NES emulators have been implemented countless times, in practically every language imaginable.

However, having an LLM fully replicate the spec purely from memory—without referencing existing code—is still a significant challenge. It requires the underlying model to have strong anti-hallucination capabilities and solid long-term planning to keep from going astray. Because of this, building an NES emulator makes for an excellent LLM stress test.

Here is how the emulator was built:

Data Gathering: I asked Codex to download the necessary developer manuals and test suites. It was strictly prohibited from searching for reference implementations online.

Development: I instructed Codex to build the emulator until all test suites passed. This process was mostly hands-free; I only chimed in to encourage it to continue when it paused.

First Draft: After just 4-5 prompts, Codex delivered a functional, pure-Python emulator—though it ran at a sluggish 7 FPS.

Optimization: Asking Codex to optimize the app completely on its own didn't work this time. Instead, I had it generate a flamegraph, which identified the PPU update as the bottleneck. I then instructed Codex to rewrite the PPU in Cython without breaking the passing tests.

Overall, I'm incredibly impressed by Codex. I already knew it was capable of the task, but the speed was astonishing. It finished the project in under an hour, using merely 2% of my weekly Pro quota.

While the NES might be a relatively easy system to emulate, I think emulation could serve as a fantastic benchmark for testing future LLMs.

4 comments

order

qsera|4 days ago

Can you try to vibe code an AI shill detector next?

nunobrito|4 days ago

Quite amazing. This opens doors to many other emulators because now it can replicate quite nicely what is expected as output.

zi2zi-jit|4 days ago

Totally agree. I am looking to build something more complex next, something like PS1 in a different language as test. That would require significant more effort but with the speed of how model gets improved I am optimistic.