top | item 39567930

(no title)

swimwiththebeat | 2 years ago

Does anyone know if this is using the Mamba architecture[1] instead of transformers? It looks like it uses a state space model (SSM) layer.

[1]: https://arxiv.org/abs/2312.00752

discuss

order

sal9000|2 years ago

It came earlier than Mamba. It uses hyena hierarchy blocks, which are considered SSM but not the same as Mamba.