top | item 38935601

(no title)

mochidusk | 2 years ago

I struggled learning about Mamba's architecture but realized it's because I had some gaps in knowledge. In no particular order, they were:

- a refresher on differential equations

- legendre polynomials

- state spaced models; you need to grok the essence of

x' = Ax + Bu

y = Cx

- discretization of S4

- HiPPO matrix

- GPU architecture (SRAM, HBM)

Basically, transformers is an architecture that uses attention. Mamba is the same architecture that replaces attention with S4 - but this S4 is modified to overcome its shortcomings, allowing it to act like a CNN during training and an RNN during inference.

I found this video very helpful: https://www.youtube.com/watch?v=8Q_tqwpTpVU

His other videos are really good too.

discuss

order

fabmilo|2 years ago

I was about to post that video too. Highly recommended.