ianand's comments

ianand | 2 months ago | on: The Q, K, V Matrices

I'm not a fan of the database lookup analogy either.

The analogy I prefer when teaching attention is celestial mechanics. Tokens are like planets in (latent) space. The attention mechanism is like a kind of "gravity" where each token is influencing each other, pushing and pulling each other around in (latent) space to refine their meaning. But instead of "distance" and "mass", this gravity is proportional to semantic inter-relatedness and instead of physical space this is occurring in a latent space.

https://www.youtube.com/watch?v=ZuiJjkbX0Og&t=3569s

ianand | 11 months ago | on: Tracing the thoughts of a large language model

> LLMs that haven't gone through RL are useless to users. They are very unreliable, and will frequently go off the rails spewing garbage, going into repetition loops, etc...RL learning involves training the models on entire responses, not token-by-token loss (1).

Yes. For those who want a visual explanation, I have a video where I walk through this process including what some of the training examples look like: https://www.youtube.com/watch?v=DE6WpzsSvgU&t=320s

ianand | 1 year ago | on: SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch

hey, creator of spreadsheets-are-all-you-need.ai here. Thanks for mentioning!

I now have a web version of GPT2 implemented in pure JavaScript for web developers at https://spreadsheets-are-all-you-need.ai/gpt2/.

The best part is that you can debug and step through it in the browser dev tools: https://youtube.com/watch?v=cXKJJEzIGy4 (100 second demo). Every single step is is in plain vanilla client side JavaScript (even the matrix multiplications). You don't need python, etc. Heck, you don't even have to leave your browser.

I recently did an updated version of my talk with it for JavaScript developers here: https://youtube.com/watch?v=siGKUyTk9M0 (52 min). That should give you a basic grounding on what's happening inside a Transformer.

ianand | 1 year ago | on: 'I grew up with it': readers on the enduring appeal of Microsoft Excel

> Even things like how cells are frozen (if you're on B2, are you freezing the first row, or the first and second row?) just feels wrong.

FWIW one my frustrations with Excel is how it does freezing of rows. I find Google's appraoch more intuitive. But the issues on computation I agree with (see my other comment on this post).

ianand | 1 year ago | on: 'I grew up with it': readers on the enduring appeal of Microsoft Excel

Google Sheets does struggle past a certain size.

For Spreadsheets-are-all-you-need (GPT2 Small implemented in Spreadsheet, ~1.25GB Excel file, 124M parameters), I have only been able to get the complete model working in Excel.

In Google Sheets, I've only been able to get it to implement a single layer out of the 12 in the model. And there's no way it can handle the embedding matrix. And it randomly quit during tokenization.

That being said, I am a fan of both Excel and Google Sheets. In Google Sheets the formulas are easier to read, and for teaching my class, it's great because students and I can be in the same sheet at the same time during a lesson.

I also tried LibreOffice briefly. While it could open the Excel file, it was unable to save it (it crashed during the save process).

ianand | 1 year ago | on: AI and the Next Computing Platforms with Jensen Huang and Mark Zuckerberg

Don't entirely disagree but I think "mad" is an unfair characterization and simplifies how the fascinating competitive dynamics have played out that resulted in the Llama models.

Meta as well as Amazon and MS have been strategically constrained by not owning the mobile platform and they've been pretty consistently clear about that. https://x.com/ianand/status/1753425306181116394

Like it or not, Apple's ATT move (i.e. "Ask App Not to Track") hit their revenue in real dollars and stock price significantly. Not only that, the ensuing chaos helped pave an opening for the rise of competitor TikTok:

> Meta, formerly known as Facebook, said that one setting alone cost the company an estimated $10 billion. Its stock value has plunged 70% this year. But ATT had another side effect, one that got far less attention than Meta’s troubles. Apple’s iPhone privacy setting gave TikTok a significant leg up in its fight for social media dominance. (from https://gizmodo.com/how-apple-s-ask-app-not-to-track-prompt-...)

Their savior to improve monetization in the face of ATT turned out to be AI:

> Further integration of AI helped drive Meta’s first revenue increase in three quarters, the company said on Wednesday. Reels monetization is up over 30% on Instagram and over 40% on Facebook on a quarterly basis as AI plays a larger role in the platforms. (from https://finance.yahoo.com/news/mark-zuckerberg-says-ai-boost...)

So improving Reels monetization and competing with TikTok is what led Zuck to seemingly clairvoyantly purchase all those GPUs at the right time to be able to build and release Llama:

> We got into this position with Reels where we needed more GPUs to train the models...we were constrained on the infrastructure in catching up to what TikTok was doing as quickly as we wanted to. I basically looked at that and I was like “hey, we have to make sure that we're never in this situation again. So let's order enough GPUs to do what we need to do on Reels and ranking content and feed. But let's also double that.” (from https://www.dwarkeshpatel.com/p/mark-zuckerberg)

ianand | 1 year ago | on: How Bad Are Ultraprocessed Foods, Really?

This has been my mental model as well. I often use two analogies to describe this.

The processed a food is the more likely you’re eating a “Product”, i.e. something that’s been designed to consumed and enjoyed.

I don’t say that as a necessarily bad thing. We’ve making foods more enjoyable for the history of civilization (broccoli was bred by humans from a wild cabbage) but industrialization has given us an ability to create new things faster than our biology can adapt.

The other analogy is to think of UPF, especially nominally nutritious ones like protein bars, as the nutritional equivalent of power tools. They can be really powerful for dialing in your macros or taste desires. But must be used carefully and are potentially dangerous if untrained.

ianand | 1 year ago | on: Your LLM Is a Capable Regressor When Given In-Context Examples

They did invent their own functions to test if the results were due to these functions being on the training date. See the section on data contamination in the paper.

Agree it both kind of makes sense (regression is the best way to predict the next token in this context) and kind of ironic (LLMs can do high school regression but can’t do elementary school long digit arithmetic).

page 1