top | item 47178325

(no title)

ralferoo | 2 days ago

The problem is that it will have been trained on multiple open source spectrum emulators. Even "don't access the internet" isn't going to help much if it can parrot someone else's emulator verbatim just from training.

Maybe a more sensible challenge would be to describe a system that hasn't previously been emulated before (or had an emulator source released publicly as far as you can tell from the internet) and then try it.

For fun, try using obscure CPUs giving it the same level of specification as you needed for this, or even try an imagined Z80-like but swapping the order of the bits in the encodings and different orderings for the ALU instructions and see how it manages it.

discuss

throwa356262|2 days ago

I think you are into something here.

I tried creating an emulator for CPU that is very well known but lacks working open source emulators.

Claude, Codex and Gemini were very good at starting something that looked great but all failed to reach a working product. They all ended up in a loop where fixing one issues caused something else to break and could never get out of it.

unknown|2 days ago

[deleted]

stuaxo|2 days ago

When they get stuck, I find adding debug that the model can access helps. + Sometimes you need to add something into the prompt to tell it to avoid some approach at a point.

trollbridge|2 days ago

I’ve been trying to do the same thing as a hobby project to just imagine some “what ifs” with some slight changes to the original 8086 and the 80286.

It just never produces an actually working result without a lot of intervention on my part. (My change was merely changing the paragraph size from 4 bits to 8 bits on the 8086.)

dboreham|2 days ago

Interesting. When I had Claude write a language transpiler it always checked that tests passed before declaring a feature ready for PR. There was never a case where it gave up on achieving that goal.

antirez|2 days ago

Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.

PontifexMinimus|2 days ago

> try using obscure CPUs

Better still invent a CPU instruction set, and get it to write an emulator for that instruction set in C.

Then invent a C-like HLL and get it to write a compiler from your HLL to your instruction set.

abainbridge|2 days ago

> try using obscure CPUs

I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"

They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.

bsoles|2 days ago

Even on a specific STM microcontroller (STM32G031), the LLM tools invent non-existent registers and then apologize when I point it out. And conversely, they write code for an entire algorithm (CRC, for example) when hardware support already exists on the chip.

stuaxo|2 days ago

Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.

yomismoaqui|2 days ago

If the LLM doesn't have a websearch tool your test doesn't make any sense.

An LLM by itself is like a lossy image of all text in the internet.

kamranjon|2 days ago

I thought this part of the write-up was interesting:

"This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation."

Can't things basically get baked into the weights when trained on enough iterations, and isn't this the basis for a lot of plagiarism issues we saw with regards to code and literature? It seems like this is maybe downplaying the unattributed use of open source code when training these models.

unknown|2 days ago

[deleted]

dist-epoch|2 days ago

If you did that, comments would be "it's just a bit shuffle of the encodings, of course it can manage that, but how about we do totally random encodings..."

ralferoo|2 days ago

That's true, but I still think it'd be an interesting experiment to see how much it actually follows the specification vs how much it hallucinates by plagiarising from existing code.

Probably bonus points for telling it that you're emulating the well known ZX Spectrum and then describe something entire different and see whether it just treats that name as an arbitrary label, or whether it significantly influences its code generation.

But you're right of course, instruction decoding is a relatively small portion of a CPU that the differences would be quite limited if all the other details remained the same. That's why a completely hypothetical system is better.