(no title)
taliesinb | 2 years ago
I've recently finished an unorthodox kind of visualization / explanation of transformers. It's sadly not interactive, but it does have some maybe unique strengths.
First, it gives array axis semantic names, represented in the diagrams as colors (which this post also uses). So sequence axis is red, key feature dimension is green, multihead axis is orange, etc. This helps you show quite complicated array circuits and get an immediate feeling for what is going on and how different arrays are being combined with each-other. Here's a pic of the the full multihead self-attention step for example:
https://math.tali.link/raster/052n01bav6yvz_1smxhkus2qrik_07...
It also uses a kind of generalization tensor network diagrammatic notation -- if anyone remembers Penrose's tensor notation, it's like that but enriched with colors and some other ideas. Underneath these diagrams are string diagrams in a particular category, though you don't need to know (nor do I even explain that!).
Here's the main blog post introducing the formalism: https://math.tali.link/rainbow-array-algebra
Here's the section on perceptrons: https://math.tali.link/rainbow-array-algebra/#neural-network...
Here's the section on transformers: https://math.tali.link/rainbow-array-algebra/#transformers
jimmySixDOF|2 years ago
https://pytorch.org/blog/inside-the-matrix/