top | item 47179119

(no title)

antirez | 4 days ago

You didn't read the full article. The past paragraph talks about this specifically.

discuss

tredre3|3 days ago

In the last paragraph you handwave that all the Z80 and ZX Spectrum documentations is likely already in the model anyway... Choosing to not provide the documents/websites might then requiring more prompting to finish the emulator, but the knowledge is there. You can't clean room with a large LLM. That's delusion!

nathell|2 days ago

Counterpoint: in December, a Polish MP [0] has vibe-coded an interpreter [1] of a 1959 Polish programming language, feeding it the available documentation. _That,_ at least, is unlikely to have appeared in the model’s training data.

[0]: https://en.wikipedia.org/wiki/Adrian_Zandberg [1]: https://sako-zam41.netlify.app/

jaen|3 days ago

I mean, for an article that's titled "clean room", that would be the first thing to do, not as a "maybe follow up in the future"...

(I do think the article could have stood on its own without mentioning anything about "clean room", which is a very high standard.)

For the handwavy point about the x86 assembler, I am quite sure that the LLM will remember the entirety of the x86 instruction set without any reference, it's more of a problem of having a very well-tuned agentic loop with no context pollution to extract it. (which you won't get by YOLOing Claude, because LLMs aren't that meta-RLed yet to be able to correct their own context/prompt-engineering problems)

Or alternatively, to exploit context pollution, take half of an open-source project and let the LLM fill in the rest (try to imagine the synthetic "prompt" it was given when training on this repo) and see how far it is from the actual version.