top | item 40037545

(no title)

rayval | 1 year ago

Here's a compelling visualization of the functioning of an LLM when processing a simple request: https://bbycroft.net/llm

This complements the detailed description provided by 3blue1brown

discuss

order

bugthe0ry|1 year ago

When visualised this way, the scale of GPT-3 is insane. I can't imagine what 4 would like here.

spi|1 year ago

IIRC, GPT-4 would actually be a bit _smaller_ to visualize than GPT3. Details are not public, but from the leaks GPT-4 (at least, some by-now old version of it) was a mixture of expert, with every model having around 110B parameters [1]. So, while the total number of parameters is bigger than GPT-3 (1800B vs. 175B), it is "just" 16 copies of a smaller (110B) parameters model. So if you wanted to visualize it in any meaningful way, the plot wouldn't grow bigger - or it would, if you included all different experts, but they are just copies of the same architecture with different parameters, which is not all that useful for visualization purposes.

[1] https://medium.com/@daniellefranca96/gpt4-all-details-leaked...