top | item 43843139

(no title)

tmalsburg2 | 10 months ago

Isn't this exactly the point of this model? No need to memorize everything (which makes transfomers expensive), just keep the relevant info. SSM are essentially recurrent models.

discuss

famouswaffles|10 months ago

You can't always know what will be "relevant info" in the future. Even humans can't do this but whenever that's an issue, we just go back and re-read, re-watch etc.

None of these modern recurrent architecture have a way to do this.

tmalsburg2|10 months ago

How often do you go back an rewatch earlier parts of a movie? I hardly ever do this. In the cinema, theater, or when listening to the radio it’s simply impossible and it still works.