top | item 39485837

(no title)

denial | 2 years ago

Something minor I always wonder about when I read Mamba is the discretization.

All of the sources I see referred to as derivations of it have a discretization of the form

h_t =Ah_{t-1} + Bx_{t-1} for the first line instead of the given one of the form h_t =Ah_{t-1} + Bx_t.

Does anyone know why this is?

discuss

order

pama|2 years ago

Not sure how much detail you need but generally there exist implicit and explicit integrators for numerically solving (integrating) ODE. The implicit ones, like the one used here, tend to be more stable. The ideas behind SSM come from control theory ideas that used integrators with stability guarantees so that the rest of the neural network can focus on other aspects of the problem.

denial|2 years ago

That's a helpful pointer. Thank you.