top | item 39345545

The Annotated S4

111 points| profchemai | 2 years ago |srush.github.io

12 comments

order

srush|2 years ago

Hi! Blog author. This was an attempt a couple years ago to understand and write about this paper in a detailed way. Here is a video going through this topic as well: https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb

Section 2 of the blog post is no longer very relevant. A lot of advances (DSS, S4D) simplified that part of the process. Arguably also this all should be updated for Mamba (same authors).

jwuphysics|2 years ago

Thanks for your spectacular resources! I see that you began an Annotated Mamba repository -- any chance you could share when that blog page might go live?

radarsat1|2 years ago

This was an excellent write up thanks. It'll help me understand the Mamba work a lot more.

I still find it really confusing how a linear model can perform so well.

imjonse|2 years ago

A lot of intimidating math that will make all self-attention tutorials seem like a walk in the park in comparison. Luckily subsequent state space models building on S4 (DSS, S4D and newer ones like Mamba) simplified the primitives and the math used.

marmaduke|2 years ago

The math is not designed to intimidate but rather approach the "how to build sequence model" in a principled way from state space models, which draws from an arguably longer literature than neural networks.

Some of concepts are better explained here than anywhere else, and make it straightforward to make sense of Mamba, which is increasingly popular.

ssivark|2 years ago

Well, but this stuff is also much more principled and much better understood (by construction) than why/how a transformer works. The price of actual understanding, and being able to make precise statements, is that the statements will be precise and detailed (ie likely involve math).

ptojr|2 years ago

Can someone point me to DSS and S4D papers?

medv|2 years ago

What I need so learn to start to understand those articles? Is there some good courses on the topic? For beginners?

adamnemecek|2 years ago

All machine learning is convolution.