(no title)
sjkoelle | 2 years ago
1) transformers create an input x input size attention matrix that is unnecessarily large. state space models somehow compress this.
2) "The main difference is simply making several parameters [in the state space model] functions of the input"
3) i think it might be more sample efficient (requires less data)
No comments yet.