(no title)
mochidusk | 2 years ago
- a refresher on differential equations
- legendre polynomials
- state spaced models; you need to grok the essence of
x' = Ax + Bu
y = Cx
- discretization of S4
- HiPPO matrix
- GPU architecture (SRAM, HBM)
Basically, transformers is an architecture that uses attention. Mamba is the same architecture that replaces attention with S4 - but this S4 is modified to overcome its shortcomings, allowing it to act like a CNN during training and an RNN during inference.
I found this video very helpful: https://www.youtube.com/watch?v=8Q_tqwpTpVU
His other videos are really good too.
fabmilo|2 years ago