top | item 39254539

(no title)

wojciem | 2 years ago

Is it only me, or after reading this article with a lot of high-level, vague phrases and anecdotes - skipping the actual essence of many smart tricks making transformers computationally efficient - it is actually harder to grasp how transformers “really work”.

I recommend videos from Andrej Karpathy on this topic. Well delivered, clearly explaining main techniques and providing python implementation

discuss

order

kgeist|2 years ago

There's also this type of articles where the first half of the article is easily understandable by a layman but then they suddenly drop a lot of jargon and math formulas and you get completely lost.

Lerc|2 years ago

A friend once described these kin of descriptions by analogy with a recipie that went;

Recipie for buns, First you need flour, this is a white fined grained powder that is produced from ground wheat that can be acquired by exchanging for money (a standardised convention for storing value) at a store which contains many such products. When mixed with the raising agent and other ingredients you should remove the buns from the oven when golden brown.

jeremiahbuckley|2 years ago

For this situation, if it feels worth it, I have been applying chatGPT Q&A on the jargon to bridge the gap. I haven’t read this article through yet, so can’t recommend, but in many cases it’s a super useful contextual jargon clearer.

Lerc|2 years ago

Agreed, I have made my own shakespeare babbler following Karpathy's videos. I have a decent understanding of the structure and process but I don't really grasp how they work.

It's obvious how the error reduces, but I feel like there's something semanticly going on that isn't directly expressed in the code.

Geisterde|2 years ago

Im saving the latter half for tomorrow but so far its making sense. People have different learning styles, and I think this is lacking in the visual department. Parts like the vectors all being displayed next to the word like "cat", could have been better annotated to show where those numbers come from visually.

3abiton|2 years ago

Super data science had a nice episode on this recently.