top | item 38712671

(no title)

sjkoelle | 2 years ago

my loose understanding

1) transformers create an input x input size attention matrix that is unnecessarily large. state space models somehow compress this.

2) "The main difference is simply making several parameters [in the state space model] functions of the input"

3) i think it might be more sample efficient (requires less data)

discuss

No comments yet.