It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this:
"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is
−
log
(
�
)
−log(p) where
�
p is the probability the model assigned to the correct token. This is called cross-entropy loss."
I see. The problem with me writing these is even though I'm not an expert, I do have a bit of knowledge on certain things so I'm prone to say things that make sense to me but not to beginners. I'll rethink it
The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning?
For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.
IMO your question is the largest unknown in the ML research field (neural net interpretability is a related area), but the most basic explanation is
"if we can always accurately guess the next 'correct' word, then we will always answer questions correctly".
An enormous amount of research+eng work (most of the work of frontier labs) is being poured into making that 'correct' modifier happen, rather than just predicting the next token from 'the internet' (naive original training corpus). This work takes the form of improved training data (e.g. expert annotations), human-feedback finetuning (e.g. RLHF), and most recently reinforcement learning (e.g. RLVR, meaning RL with verifiable rewards), where the model is trained to find the correct answer to a problem without 'token-level guidance'. RL for LLMs is a very hot research area and very tricky to solve correctly.
Because it's not statistical inference on words or characters but rather stacked layers of statistical inference on ~arbitrarily complex semantic concepts which is then performed recursively.
DNNs aren't really "statistical" inference in the way most people would understand the term statistics. The underlying maths owes much more to calculus than statistics. The model isn't just encoding statistics about the text it was trained on, it's attempting to optimize a solution to the problem of picking the next token with all the complexity that goes into that.
One problem is that "statistical inference" is overly reductive. Sure, there's a statistical aspect to the computations in a neural network, but there's more to it than that. As there is in the human brain.
I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.
"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"
Which starts to feel pretty owly indeed.
I think the whole thing could be expanded to cover some more of it in greater depth.
Is it becoming a thing to misspell and add grammatical mistakes on purpose to show that an LLM didn't write the blog post? I noticed several spelling mistakes in Karpathy's blog post that this article is based on and in this article.
People aren't gonna be happy I spell this out, but, Karpathy's not The Dude.
He's got a big Twitter following so people assume somethings going on or important, but he just isn't.
Biggest thing he did in his career was feed Elon's Full Self Driving delusion for years and years and years.
Note, then, how long he lasted at OpenAI, and how much time he spends on code golf.
If you're angry to read this, please, take a minute and let me know the last time you saw something from him that didn't involve A) code golf B) coining phrases.
I know many comments mentioned that it was too introductory, or too deep. But as someone that does not have much experience understanding how these models work, I found this overview to be pretty great.
There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.
That was one of the most helpful walkthroughs i've read. Thanks for explaining so well with all of the steps.
I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.
politelemon|1 day ago
Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...
ayhanfuat|23 hours ago
unknown|23 hours ago
[deleted]
growingswe|23 hours ago
jmkd|22 hours ago
"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."
growingswe|18 hours ago
windowshopping|23 hours ago
For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.
ferris-booler|22 hours ago
An enormous amount of research+eng work (most of the work of frontier labs) is being poured into making that 'correct' modifier happen, rather than just predicting the next token from 'the internet' (naive original training corpus). This work takes the form of improved training data (e.g. expert annotations), human-feedback finetuning (e.g. RLHF), and most recently reinforcement learning (e.g. RLVR, meaning RL with verifiable rewards), where the model is trained to find the correct answer to a problem without 'token-level guidance'. RL for LLMs is a very hot research area and very tricky to solve correctly.
fc417fc802|22 hours ago
mike_hearn|10 hours ago
antonvs|19 hours ago
malnourish|22 hours ago
davidw|22 hours ago
"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"
Which starts to feel pretty owly indeed.
I think the whole thing could be expanded to cover some more of it in greater depth.
love2read|22 hours ago
klysm|22 hours ago
efilife|21 hours ago
refulgentis|21 hours ago
He's got a big Twitter following so people assume somethings going on or important, but he just isn't.
Biggest thing he did in his career was feed Elon's Full Self Driving delusion for years and years and years.
Note, then, how long he lasted at OpenAI, and how much time he spends on code golf.
If you're angry to read this, please, take a minute and let me know the last time you saw something from him that didn't involve A) code golf B) coining phrases.
grey-area|22 hours ago
thebiblelover7|17 hours ago
There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.
kinnth|18 hours ago
I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.
dreamking|16 hours ago
https://www.t-mobile.com/home-internet/http-warning?url=http...
danhergir|19 hours ago
growingswe|18 hours ago
ChrisArchitect|22 hours ago
Microgpt
https://news.ycombinator.com/item?id=47202708
nimbus-hn-test|1 day ago
[deleted]