ianand | 1 month ago | on: The spectrum of isolation: From bare metal to WebAssembly
ianand's comments
ianand | 2 months ago | on: The Q, K, V Matrices
The analogy I prefer when teaching attention is celestial mechanics. Tokens are like planets in (latent) space. The attention mechanism is like a kind of "gravity" where each token is influencing each other, pushing and pulling each other around in (latent) space to refine their meaning. But instead of "distance" and "mass", this gravity is proportional to semantic inter-relatedness and instead of physical space this is occurring in a latent space.
ianand | 10 months ago | on: Intel: Winning and Losing
ianand | 10 months ago | on: Show HN: GPT-2 implemented using graphics shaders
ianand | 10 months ago | on: Show HN: GPT-2 implemented using graphics shaders
ianand | 10 months ago | on: Show HN: GPT-2 implemented using graphics shaders
Curious why you chose WebGL over WebGPU? Just to show it can be done?
(Also see my other comment about fetching weights from huggingface)
ianand | 10 months ago | on: Show HN: GPT-2 implemented using graphics shaders
ianand | 10 months ago | on: Local LLM inference – impressive but too hard to work with
ianand | 10 months ago | on: Local LLM inference – impressive but too hard to work with
ianand | 11 months ago | on: Xee: A Modern XPath and XSLT Engine in Rust
ianand | 11 months ago | on: Tracing the thoughts of a large language model
ianand | 11 months ago | on: Tracing the thoughts of a large language model
Yes. For those who want a visual explanation, I have a video where I walk through this process including what some of the training examples look like: https://www.youtube.com/watch?v=DE6WpzsSvgU&t=320s
ianand | 1 year ago | on: SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch
I now have a web version of GPT2 implemented in pure JavaScript for web developers at https://spreadsheets-are-all-you-need.ai/gpt2/.
The best part is that you can debug and step through it in the browser dev tools: https://youtube.com/watch?v=cXKJJEzIGy4 (100 second demo). Every single step is is in plain vanilla client side JavaScript (even the matrix multiplications). You don't need python, etc. Heck, you don't even have to leave your browser.
I recently did an updated version of my talk with it for JavaScript developers here: https://youtube.com/watch?v=siGKUyTk9M0 (52 min). That should give you a basic grounding on what's happening inside a Transformer.
ianand | 1 year ago | on: RWKV Language Model
ps Eugene you should brag about that on the homepage of RWKV.
ianand | 1 year ago | on: 'I grew up with it': readers on the enduring appeal of Microsoft Excel
FWIW one my frustrations with Excel is how it does freezing of rows. I find Google's appraoch more intuitive. But the issues on computation I agree with (see my other comment on this post).
ianand | 1 year ago | on: 'I grew up with it': readers on the enduring appeal of Microsoft Excel
For Spreadsheets-are-all-you-need (GPT2 Small implemented in Spreadsheet, ~1.25GB Excel file, 124M parameters), I have only been able to get the complete model working in Excel.
In Google Sheets, I've only been able to get it to implement a single layer out of the 12 in the model. And there's no way it can handle the embedding matrix. And it randomly quit during tokenization.
That being said, I am a fan of both Excel and Google Sheets. In Google Sheets the formulas are easier to read, and for teaching my class, it's great because students and I can be in the same sheet at the same time during a lesson.
I also tried LibreOffice briefly. While it could open the Excel file, it was unable to save it (it crashed during the save process).
ianand | 1 year ago | on: AI and the Next Computing Platforms with Jensen Huang and Mark Zuckerberg
Meta as well as Amazon and MS have been strategically constrained by not owning the mobile platform and they've been pretty consistently clear about that. https://x.com/ianand/status/1753425306181116394
Like it or not, Apple's ATT move (i.e. "Ask App Not to Track") hit their revenue in real dollars and stock price significantly. Not only that, the ensuing chaos helped pave an opening for the rise of competitor TikTok:
> Meta, formerly known as Facebook, said that one setting alone cost the company an estimated $10 billion. Its stock value has plunged 70% this year. But ATT had another side effect, one that got far less attention than Meta’s troubles. Apple’s iPhone privacy setting gave TikTok a significant leg up in its fight for social media dominance. (from https://gizmodo.com/how-apple-s-ask-app-not-to-track-prompt-...)
Their savior to improve monetization in the face of ATT turned out to be AI:
> Further integration of AI helped drive Meta’s first revenue increase in three quarters, the company said on Wednesday. Reels monetization is up over 30% on Instagram and over 40% on Facebook on a quarterly basis as AI plays a larger role in the platforms. (from https://finance.yahoo.com/news/mark-zuckerberg-says-ai-boost...)
So improving Reels monetization and competing with TikTok is what led Zuck to seemingly clairvoyantly purchase all those GPUs at the right time to be able to build and release Llama:
> We got into this position with Reels where we needed more GPUs to train the models...we were constrained on the infrastructure in catching up to what TikTok was doing as quickly as we wanted to. I basically looked at that and I was like “hey, we have to make sure that we're never in this situation again. So let's order enough GPUs to do what we need to do on Reels and ranking content and feed. But let's also double that.” (from https://www.dwarkeshpatel.com/p/mark-zuckerberg)
ianand | 1 year ago | on: How Bad Are Ultraprocessed Foods, Really?
The processed a food is the more likely you’re eating a “Product”, i.e. something that’s been designed to consumed and enjoyed.
I don’t say that as a necessarily bad thing. We’ve making foods more enjoyable for the history of civilization (broccoli was bred by humans from a wild cabbage) but industrialization has given us an ability to create new things faster than our biology can adapt.
The other analogy is to think of UPF, especially nominally nutritious ones like protein bars, as the nutritional equivalent of power tools. They can be really powerful for dialing in your macros or taste desires. But must be used carefully and are potentially dangerous if untrained.
ianand | 1 year ago | on: Your LLM Is a Capable Regressor When Given In-Context Examples
Agree it both kind of makes sense (regression is the best way to predict the next token in this context) and kind of ironic (LLMs can do high school regression but can’t do elementary school long digit arithmetic).
ianand | 1 year ago | on: AI used well can restore middle class jobs [pdf]