Isn't this exactly the point of this model? No need to memorize everything (which makes transfomers expensive), just keep the relevant info. SSM are essentially recurrent models.
You can't always know what will be "relevant info" in the future. Even humans can't do this but whenever that's an issue, we just go back and re-read, re-watch etc.
None of these modern recurrent architecture have a way to do this.
How often do you go back an rewatch earlier parts of a movie? I hardly ever do this. In the cinema, theater, or when listening to the radio it’s simply impossible and it still works.
famouswaffles|10 months ago
None of these modern recurrent architecture have a way to do this.
tmalsburg2|10 months ago