(no title)
AutoJanitor | 8 days ago
819K parameters. Responses are short and sometimes odd. That's expected at this scale with a small training corpus. The achievement is that it runs at all on this hardware.
Context window is 64 tokens. Prompt + response must fit in 64 bytes.
No memory between dialogs. The KV cache resets each conversation.
Byte-level vocabulary. The model generates one ASCII character at a time.
Future DirectionsThese are things we're working toward — not current functionality:
RSP microcode acceleration — the N64's RSP has 8-lane SIMD (VMULF/VMADH); offloading matmul would give an estimated 4–8× speedup over scalar VR4300
Larger model — with the Expansion Pak (8MB total), a 6-layer model fits in RAM
Richer training data — more diverse corpus = more coherent responses
Real cartridge deployment — EverDrive compatibility, real hardware video coming
Why This Is RealThe VR4300 was designed for game physics, not transformer inference. Getting Q8.7 fixed-point attention, FFN, and softmax running stably at 93MHz required:
Custom fixed-point softmax (bit-shift exponential to avoid overflow)
Q8.7 accumulator arithmetic with saturation guards
Soft-float compilation flag for float16 block scale decode
Alignment-safe weight pointer arithmetic for the ROM DFS filesystem
The inference code is in nano_gpt.c. The training script is train_sophia_v5.py. Build it yourself and verify.
No comments yet.