As someone who has no technical knowledge of Llama or any of the LLM work, from conceptual understanding to technical implementation, is there any benefit to sit down and go through this from start to finish? Or is effort better spent somewhere else?
Like a roadmap, do A, do B And finally go through this in the end.
This was posted on HN a while ago and led to some great discussion. Myself and others agreed that this type of stateful visualization was _way_ more effective at conceptualizing how an LLM works than reading code or stepping through a debugger.
my opinion: it quickly gets into "the math behind LLMs" that make no sense to me
words i understand but don't really get: weights, feed forward, layers, tensors, embeddings, normalization, transformers, attention, positioning, vector
There's "programming" in the plumbing sense where you move data around through files/sockets and then there's this... somebody without a math background/education... very unlikely you'll understand it. it's just skimming python and not understand the math/library calls it makes
Only do it if you want the illusion of LLM's to be shattered. Suddenly every day you'll see two to three highly upvoted links on HN and be unable to keep your eyes from rolling.
If you like this, it's also worth looking at llama2.c[1], an implementation of the Llama 2 architecture in about 1000 lines of plain, dependency-free C, tokenizer and all. THe fact that this 960-line file and a somewhat modern C compiler is all you really need to run a state-of-the-art language model is really surprising to many.
Of course, this is not all there is to a modern LLM, it would probably take another thousand lines or two to implement training, and many more than that to make it fast on all the major CPU and GPU architectures. If you want a flexible framework that lets a developer define any model you want and still goes as fast as it can, the complexity spirals.
Most programmers have an intuition that duplicating a large software project from scratch, like Linux or Chromium for example, would require incredible amounts of expertise, manpower and time. It's not something that a small team can achieve in a few months. You're limited by talent, not hardware.
LLMs are very different. THe code isn't that complicated, you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance, as an individual with a background in programming and who still remembers their calculus and linear algebra, with a year or so of self study. What makes LLMs difficult is getting access to all the hardware to train them, getting the data, and being able to preprocess that data.
There is a tick-tock between searching the dominant NN architectures (tick) and optimizing for accuracy, compute and inference latency and throughput (tock).
This particular (tock) is still playing out. The next (tick) does not feel imminent and will likely depend on when we discover the limits of the transformers when it comes to solving for long tail of use-cases.
The innovation is the amount of resources people are willing to spend right now. From looking at the research code it's clear that the whole field is basically doing a (somewhat) guided search in the entire space of possible layer permutations.
There seems to be no rhyme or reason, no scientific insight, no analysis. They just try a million different permutations, and whatever scores the highest on the benchmarks gets published.
The only thing that has changed since 2018 is the most popular network structure to play with. The code looks the same as always; python notebooks where someone manually calculated the size of each hard-coded layer to make it fit.
I’ve occasionally worked with more dynamic models (tree structured decoding). They are generally not a good fit for trying to max gpu thoroughput. A lot of magic of transformers and large language models is about pushing gpu as much we can and simpler static model architecture that trains faster can train on much more data.
So until the hardware allows for comparable (say with 2-4x) thoroughput of samples per second I expect model architecture to mostly be static for most effective models and dynamic architectures to be an interesting side area.
Iterative leaps of open-source models becoming better are huge examples that companies competing on LLM model layer have an ephemeral moat.
Serious question: assuming this is true, if an incumbent-challenger like OpenAI wants to win, how do they effectively compete against current services such as Meta and Google product offerings which can be AI enhanced in a snap?
At least they use punctuation. We've recently had a project on HN where the author used only lower cases and no punctuation because they equated it to being chained by the system.
Seeing Anya (the girl pointing at pictures), I'd guess the author is partial to Japanese culture. As their writing system does not have a concept of upper/lower case, he might just have determined that they are superfluous. Or he is simply an eccentric. Though I guess this is one of the things that some folks will not care and others getting hung up mightily.
I personally don't really mind that bit of capitalization that English does. German is much worse.
2024 is the year that most of us are collectively growing out of the early social media era all-lowercase thing, but everyone hasn't gotten the memo yet.
Aaaaaaaaaa.org is possibly the worst domain name I've ever encountered in all my time using the internet. I support your mission but you need to change that.
I'd like to see this using ONNX and streaming from storage (I have my reasons, but mostly about using commodity hardware for "slow" batch processing without a GPU)
I know its not really related but I've noticed something that is making me feel out of touch. Lately there seems to be this increasing merge of tech with weeaboo culture. I may not have the term exactly right but I am talking about the anime girl in the OP's blog post. Its not everywhere but I've started to notice, so it is increasing. Did I miss something? Is this replacing meme's in tech speeches? (I was never fond of that either so I guess I'm a curmudgeon or perhaps my ADHD brain just finds it too distracting)
The post looks informative I hope to learn something from it later tonight. Thx
[+] [-] mattfrommars|1 year ago|reply
Like a roadmap, do A, do B And finally go through this in the end.
[+] [-] joenot443|1 year ago|reply
This was posted on HN a while ago and led to some great discussion. Myself and others agreed that this type of stateful visualization was _way_ more effective at conceptualizing how an LLM works than reading code or stepping through a debugger.
[+] [-] MuffinFlavored|1 year ago|reply
words i understand but don't really get: weights, feed forward, layers, tensors, embeddings, normalization, transformers, attention, positioning, vector
There's "programming" in the plumbing sense where you move data around through files/sockets and then there's this... somebody without a math background/education... very unlikely you'll understand it. it's just skimming python and not understand the math/library calls it makes
[+] [-] danielmarkbruce|1 year ago|reply
Google and find the examples where someone does it in a spreadsheet. It's much more approachable that way.
You are going to find it's not that complicated.
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] krainboltgreene|1 year ago|reply
[+] [-] miki123211|1 year ago|reply
Of course, this is not all there is to a modern LLM, it would probably take another thousand lines or two to implement training, and many more than that to make it fast on all the major CPU and GPU architectures. If you want a flexible framework that lets a developer define any model you want and still goes as fast as it can, the complexity spirals.
Most programmers have an intuition that duplicating a large software project from scratch, like Linux or Chromium for example, would require incredible amounts of expertise, manpower and time. It's not something that a small team can achieve in a few months. You're limited by talent, not hardware.
LLMs are very different. THe code isn't that complicated, you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance, as an individual with a background in programming and who still remembers their calculus and linear algebra, with a year or so of self study. What makes LLMs difficult is getting access to all the hardware to train them, getting the data, and being able to preprocess that data.
[+] [-] brcmthrowaway|1 year ago|reply
I'm kind of shocked. I thought there would be more dynamism by now and I stopped dabbling in like 2018.
[+] [-] curious_cat_163|1 year ago|reply
This particular (tock) is still playing out. The next (tick) does not feel imminent and will likely depend on when we discover the limits of the transformers when it comes to solving for long tail of use-cases.
My $0.02.
[+] [-] delusional|1 year ago|reply
There seems to be no rhyme or reason, no scientific insight, no analysis. They just try a million different permutations, and whatever scores the highest on the benchmarks gets published.
[+] [-] aDyslecticCrow|1 year ago|reply
[+] [-] astrange|1 year ago|reply
There's still some room for experimenting if you care about memory/power efficiency, like MoE models, but they're not as well understood yet.
[+] [-] Mehdi2277|1 year ago|reply
So until the hardware allows for comparable (say with 2-4x) thoroughput of samples per second I expect model architecture to mostly be static for most effective models and dynamic architectures to be an interesting side area.
[+] [-] fnetisma|1 year ago|reply
Serious question: assuming this is true, if an incumbent-challenger like OpenAI wants to win, how do they effectively compete against current services such as Meta and Google product offerings which can be AI enhanced in a snap?
[+] [-] hovering_nox|1 year ago|reply
[+] [-] tredre3|1 year ago|reply
[+] [-] skriticos2|1 year ago|reply
I personally don't really mind that bit of capitalization that English does. German is much worse.
[+] [-] renegade-otter|1 year ago|reply
[+] [-] nekochanwork|1 year ago|reply
[+] [-] teaearlgraycold|1 year ago|reply
[+] [-] jpamata|1 year ago|reply
Also it looks more casual and authentic, less LLM generated
[+] [-] Pr0ject217|1 year ago|reply
[+] [-] adamrezich|1 year ago|reply
[+] [-] spencerchubb|1 year ago|reply
such as your comment and my comment!
[+] [-] efilife|1 year ago|reply
[+] [-] TacticalCoder|1 year ago|reply
A just question.
[+] [-] baobabKoodaa|1 year ago|reply
[+] [-] cocochanel|1 year ago|reply
[+] [-] ronsor|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] kelahcim|1 year ago|reply
[+] [-] bdangubic|1 year ago|reply
[+] [-] Retr0id|1 year ago|reply
[+] [-] jongorer|1 year ago|reply
[+] [-] windowshopping|1 year ago|reply
[+] [-] _giorgio_|1 year ago|reply
[+] [-] naklecha|1 year ago|reply
[+] [-] blackeyeblitzar|1 year ago|reply
[+] [-] digitaltrees|1 year ago|reply
[+] [-] magoghm|1 year ago|reply
[+] [-] rcarmo|1 year ago|reply
[+] [-] helboi4|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] _lateralus_|1 year ago|reply
[+] [-] kunalgupta|1 year ago|reply
[+] [-] hacker_88|1 year ago|reply
[+] [-] xzghfat|1 year ago|reply
[+] [-] citizenpaul|1 year ago|reply
The post looks informative I hope to learn something from it later tonight. Thx
[+] [-] verbalstone|1 year ago|reply