very much inspired by karpathy's microgpt of the same name. it's (by default) a 4000 param GPT/LLM/NN that learns to generate names. this is sorta an educational tool in that you can visualize the activations as they pass through the network, and click on things to get an explanation of them.
Amazing work! Reminded me of LLM Visualization (https://bbycroft.net/llm) except this is a lot easier to wrap my head around and that I can actually run the training loops, which makes sense given the simplicity of the original microgpt.
To give a sense of what the loss value means, maybe you can add a small explainer section as a question and add this explanation from Karpathy’s blog:
> Over 1,000 steps the loss decreases from around 3.3 (random guessing among 27 tokens: −log(1/27)≈3.3) down to around 2.37.
to reiterate that the model is being trained to predict the next token out of 27 possible tokens and is now doing better than the baseline of random guess.
There used to be this page that showed the activations/residual stream from gpt-2 visualized as a black-white image. I remember it being neat how you could slowly see order forming from seemingly random activations as it progressed through the layers.
Can't find it now though (maybe the link rotted?), anyone happen to know what that was?
the untrained model is literally just generating random characters, whereas your examples are at least pronouncable. you can add more layers to get progressively better results.
Depends on the model size, batch size, input sequence length, ... etc. With a small model like this you'll never get a 'good' output but you can maximise its potential.
I trained 12,000 steps at 4 layers, and the output is kind of name-like, but it didn't reproduce any actual name from it's training data after 20 or so generations.
Minor nit: In familiarity, you gloss over the fact that it's character rather than token based which might be worth a shout out:
"Microgpt's larger cousins using building blocks called tokens representing one or more letters. That's hard to reason about, but essential for building sentences and conversations.
"So we'll just deal with spelling names using the English alphabet. That gives us 26 tokens, one for each letter."
hm. the way i see things, characters are the natural/obvious building blocks and tokenization is just an improvement on that. i do mention chatgpt et al. use tokens in the last q&a dropdown, though
kengoa|14 days ago
To give a sense of what the loss value means, maybe you can add a small explainer section as a question and add this explanation from Karpathy’s blog:
> Over 1,000 steps the loss decreases from around 3.3 (random guessing among 27 tokens: −log(1/27)≈3.3) down to around 2.37.
to reiterate that the model is being trained to predict the next token out of 27 possible tokens and is now doing better than the baseline of random guess.
interloxia|14 days ago
https://karpathy.github.io/2026/02/12/microgpt/
It was submitted to hn a few days ago but only received a few comments. https://news.ycombinator.com/item?id=47000263
krackers|14 days ago
Can't find it now though (maybe the link rotted?), anyone happen to know what that was?
RugnirViking|14 days ago
b44|14 days ago
lucrbvi|14 days ago
stevage|14 days ago
alansaber|14 days ago
BloondAndDoom|14 days ago
msla|14 days ago
alansaber|14 days ago
WatchDog|14 days ago
b44|14 days ago
GaggiX|14 days ago
b44|14 days ago
kfsone|14 days ago
"Microgpt's larger cousins using building blocks called tokens representing one or more letters. That's hard to reason about, but essential for building sentences and conversations.
"So we'll just deal with spelling names using the English alphabet. That gives us 26 tokens, one for each letter."
mips_avatar|14 days ago
b44|14 days ago
ramon156|14 days ago
keepamovin|14 days ago
prakashdep|13 days ago
armcat|13 days ago
darepublic|14 days ago
youio|14 days ago
nivcmo|14 days ago
[deleted]
umairnadeem123|14 days ago
[deleted]