top | item 39401073 (no title) blainm | 2 years ago I would be curious to know if anyone has tried a hybrid approach where you have a Mamba-like architecture for longer term recall but it's combined with a transformer for short term memory? discuss order hn newest logicchains|2 years ago Yep, https://arxiv.org/abs/2402.04248 tried a Mambaformer which seemed to perform well. enonimal|2 years ago maybe a fun karpathy video here...
logicchains|2 years ago Yep, https://arxiv.org/abs/2402.04248 tried a Mambaformer which seemed to perform well.
logicchains|2 years ago
enonimal|2 years ago