top | item 47205208

Microgpt explained interactively

298 points| growingswe | 1 day ago |growingswe.com

47 comments

order

politelemon|1 day ago

> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.

Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...

ayhanfuat|23 hours ago

You are absolutely right. The whole post reads like AI generated.

jmkd|22 hours ago

It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this:

"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ⁡ ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."

growingswe|18 hours ago

I see. The problem with me writing these is even though I'm not an expert, I do have a bit of knowledge on certain things so I'm prone to say things that make sense to me but not to beginners. I'll rethink it

windowshopping|23 hours ago

The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning?

For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.

ferris-booler|22 hours ago

IMO your question is the largest unknown in the ML research field (neural net interpretability is a related area), but the most basic explanation is "if we can always accurately guess the next 'correct' word, then we will always answer questions correctly".

An enormous amount of research+eng work (most of the work of frontier labs) is being poured into making that 'correct' modifier happen, rather than just predicting the next token from 'the internet' (naive original training corpus). This work takes the form of improved training data (e.g. expert annotations), human-feedback finetuning (e.g. RLHF), and most recently reinforcement learning (e.g. RLVR, meaning RL with verifiable rewards), where the model is trained to find the correct answer to a problem without 'token-level guidance'. RL for LLMs is a very hot research area and very tricky to solve correctly.

fc417fc802|22 hours ago

Because it's not statistical inference on words or characters but rather stacked layers of statistical inference on ~arbitrarily complex semantic concepts which is then performed recursively.

mike_hearn|10 hours ago

DNNs aren't really "statistical" inference in the way most people would understand the term statistics. The underlying maths owes much more to calculus than statistics. The model isn't just encoding statistics about the text it was trained on, it's attempting to optimize a solution to the problem of picking the next token with all the complexity that goes into that.

antonvs|19 hours ago

One problem is that "statistical inference" is overly reductive. Sure, there's a statistical aspect to the computations in a neural network, but there's more to it than that. As there is in the human brain.

malnourish|22 hours ago

I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.

davidw|22 hours ago

It started off nicely but before long you get

"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"

Which starts to feel pretty owly indeed.

I think the whole thing could be expanded to cover some more of it in greater depth.

love2read|22 hours ago

Is it becoming a thing to misspell and add grammatical mistakes on purpose to show that an LLM didn't write the blog post? I noticed several spelling mistakes in Karpathy's blog post that this article is based on and in this article.

klysm|22 hours ago

I expect this kind of counter signaling to become more common in the coming years.

efilife|21 hours ago

You just started to notice it

refulgentis|21 hours ago

People aren't gonna be happy I spell this out, but, Karpathy's not The Dude.

He's got a big Twitter following so people assume somethings going on or important, but he just isn't.

Biggest thing he did in his career was feed Elon's Full Self Driving delusion for years and years and years.

Note, then, how long he lasted at OpenAI, and how much time he spends on code golf.

If you're angry to read this, please, take a minute and let me know the last time you saw something from him that didn't involve A) code golf B) coining phrases.

thebiblelover7|17 hours ago

I know many comments mentioned that it was too introductory, or too deep. But as someone that does not have much experience understanding how these models work, I found this overview to be pretty great.

There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.

kinnth|18 hours ago

That was one of the most helpful walkthroughs i've read. Thanks for explaining so well with all of the steps.

I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.

danhergir|19 hours ago

I went through the article, and it makes sense to me that we're getting names as an output, but why doing so with names?

growingswe|18 hours ago

Names is just a random problem to demonstrate the model. It could be anything, I believe