top | item 36070090

Show HN: Visual intuitive explanations of LLM concepts (LLM University)

303 points| jayalammar | 2 years ago

Hi HN,

We've just published a lot of original, visual, and intuitive explanations of concepts to introduce people to large language models.

It's available for free with no sign-up needed and it includes text articles, some video explanations, and code examples/notebooks as well. And we're available to answer your questions in a dedicated Discord channel.

You can find it here: https://llm.university/

Having written https://jalammar.github.io/illustrated-transformer/, I've been thinking about these topics and how best to communicate them for half a decade. But this project is extra special to me because I got to collaborate on it with two of who I think of as some of the best ML educators out there. Luis Serrano of https://www.youtube.com/@SerranoAcademy and Meor Amer, author of "A Visual Introduction to Deep Learning" https://kdimensions.gumroad.com/l/visualdl

We're planning to roll out more content to it (let us know what concepts interest you). But as of now, it has the following structure (With some links for highlighted articles for you to audit):

---

Module 1: What are Large Language Models

- Text Embeddings (https://docs.cohere.com/docs/text-embeddings)

- Similarity between words and sentences (https://docs.cohere.com/docs/similarity-between-words-and-sentences)

- The attention mechanism

- Transformer models (https://docs.cohere.com/docs/transformer-models HN Discussion: https://news.ycombinator.com/item?id=35576918)

- Semantic search

---

Module 2: Text representation

- Classification models (https://docs.cohere.com/docs/classification-models)

- Classification Evaluation metrics (https://docs.cohere.com/docs/evaluation-metrics)

- Classification / Embedding API endpoints

- Semantic search

- Text clustering

- Topic modeling (goes over clustering Ask HN posts https://docs.cohere.com/docs/clustering-hacker-news-posts)

- Multilingual semantic search

- Multilingual sentiment analysis

---

Module 3: Text generation

- Prompt engineering (https://docs.cohere.com/docs/model-prompting)

- Use case ideation

- Chaining prompts

---

A lot of the content originates from common questions we get from users of the LLMs we serve at Cohere. So the focus is more on application of LLMs than theory or training LLMs.

Hope you enjoy it, open to all feedback and suggestions!

36 comments

jfarmer|2 years ago

> We've just published a lot of original, visual, and intuitive explanations of concepts to introduce people to large language models.

Kinda frustrating that the main link dumps me onto what reads like a university syllabus, and nothing original, visual, or intuitive.

If I click through the sections in order, there are 5 "preamble" sections describing logistical and other meta-information about the course. All text.

The first pedagogical image I see this this, which tbh doesn't make any sense to me: https://files.readme.io/329efd5-image.png

"Where would you put the word apple?"

The image alone doesn't work without reading the supporting text very closely. I also have to have a pretty sophisticated understanding to get the idea that I can represent words as points in a plane.

Representing the words as icons is fundamentally confusing, too, I think. After all, maybe I say the word "apple" should go in "d" because it has at least two senses: a fruit and a machine.

Oh, sorry, you failed your first quiz!

"You can't fail the quiz, you're not being graded." Then why call it a quiz? Why use classroom metaphors unless you want students to fall back on classroom behaviors?

Of course, you know the #1 student classroom behavior: not reading the syllabus.

But if I have no trouble with that level of abstraction, what's with the cutesty way of describing the problem?

Get rid of all this chocolate-covered broccoli. Just say and show what you mean.

Computers like numbers. Vectors are lists of numbers. Vectors come with concepts like length and distance. We want to transform words into vectors so that words we think of as similar are close together as vectors.

There are many ways to translate words into vectors. Here are 5-10 examples of how we might do that. What are some pros/cons? What relationship(s) do they make clear or obscure?

Get them thinking about what it means to embed things and why we'd want to embed words one way vs. another. That'll pay dividends. Having them remember "where the apple icon goes" isn't going to be something they'll benefit from reflecting on in any future experience.

pumanoir|2 years ago

I read and watched almost all the modules and for me it as it is, perfectly accomplishes the intention of the course as stated by the op.

Your suggestion may work for other intents (like having a Schaum's Outline of LLM's) and I would also love to have that additional material (maybe yourself could provide it as it seems you have a clear idea)

jayalammar|2 years ago

The landing page is technically the course overview. I'd love to hear what you think would've made it more engaging for you. We can probably pull up some of the visuals to it as a preview. Let me see what we can do on that front.

SanderNL|2 years ago

That’s just you. I find the apple thing obvious on first sight and cements an intuition that talk of vectors does not or differently. Why choose?

toppy|2 years ago

Jay, I liked your tutorial on Transformer models. Helped me a lot when I read it in 2020. One of the best resources on a topic then. Thanks for your work! Fingers crossed for your new endeavour.

jayalammar|2 years ago

Thank you so much (and others for your kind messages). Glad you found them useful! Writing is the best way for me to learn, I find.

ZeroCool2u|2 years ago

This looks like a pretty great resource and I'm looking forward to checking it out. My only ask is that since it's the type of site I'd probably be looking at for quite a while it'd be nice if it had a dark mode.

beeburrt|2 years ago

You know what would be helpful? A little tag or something at the beginning of each section that says about how long it's going to take.

From what I've seen so far, it looks awesome. I'm excited to dive in. Thanks!

kfarr|2 years ago

This is pretty excellent material, even just spending 10 minutes I have learned more than most random blog posts in the past few months.

HarHarVeryFunny|2 years ago

I'm not sure how much is actually known to write about, but what I'd like to see explained is how transformer-based LLMs/AI really work - not at the mechanistic level of the architecture, but in terms of what they learn (some type of world model ? details, not hand waving!) and how do they utilize this when processing various types of input ?

What type of representations are being used internally in these models ? We've got token embeddings going in, and it seems like some type of semantic embeddings internally perhaps, but exactly what ? OTOH it's outputting words (tokens) with only a linear layer between the last transformer block and the softmax, so what does that say about the representations at that last transformer block ?

jayalammar|2 years ago

This is a field I find fascinating. It's generally the research field of Machine Learning Interpretability. The BlackboxNLP workshop is one of the main places for investigating this and is a very popular academic workshop https://blackboxnlp.github.io/

One of the most interesting presentations in the last session of the workshop is this talk by David Bau titled "Direct Model Editing and Mechanistic Interpretability". David and his team locate exact information in the model, and edit it. So for example they edit the location of the Eiffel Tower to be in Rome. So whenever the model generates anything involving location (e.g., the view from the top of the tower), it actually describes Rome

Talk: https://www.youtube.com/watch?v=I1ELSZNFeHc

Paper: https://rome.baulab.info/

Follow-up work: https://memit.baulab.info/

There is also work on "Probing" the representation vectors inside the model and investigating what information is encoded at the various layers. One early Transformer Explainability paper (BERT Rediscovers the Classical NLP Pipeline https://arxiv.org/abs/1905.05950) found that "the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way: POS tagging, parsing, NER, semantic roles, then coreference". Meaning that the representations in the earlier layers encode things like whether a token is a verb or noun, and later layers encode other, higher-level information. I've made an intro to these probing methods here: https://www.youtube.com/watch?v=HJn-OTNLnoE

A lot of applied work doesn't require interpretability and explainability at the moment, but I suspect the interest will continue to increase.

famouswaffles|2 years ago

No one can tell you much about that. Interpretability is still very poor.

You don't know what they learn beforehand (else deep learning wouldn't be necessary) so you have to try and figure it out afterwards.

But artificial parameters aren't beholden to any sort of "explainabilty rule". No guarantee anything is wired in a way for humans to comprehend. And even if it was, you're looking at hundreds of billions of parameters potentially.

uoaei|2 years ago

> not at the mechanistic level of the architecture, but in terms of what they learn (some type of world model ? details, not hand waving!)

https://imgs.xkcd.com/comics/tasks.png

coolandsmartrr|2 years ago

Hi Jay,

I really loved your [explainer on AI Art](https://www.youtube.com/watch?v=MXmacOUJUaw), and I've already added more of your videos and articles on my watch-later read-later lists! Can't wait to spend more time with them this weekend.

Thank you for creating such wonderful resources!

jwilber|2 years ago

Love these.

I’ve also made some visual explanations for ml for Amazon, available at https://mlu-explain.github.io/

Big fan of your early work, Jay, a big inspiration for me!

jayalammar|2 years ago

That's beautiful! Hope you're getting to do more of these!

axpy906|2 years ago

You sir get an up vote for simply being Jay on HN. Thank you for all you do.

abrinz|2 years ago

Nice work!

Minor nitpick: The intercom button obscures the topic expansion button for the final appendix in the nav menu. Maybe move intercom to the bottom right instead?

stclaus|2 years ago

Looks great, thanks! It would be useful to add chapters indicators / links to jump directly to a specific news in the audio

sva_|2 years ago

Interesting, just yesterday I was googling something about transformers and had arrived on your page.

senttoschool|2 years ago

Looks great. Thank you.

40fishes|2 years ago

Looks really helpful. Joined the community as well.