(no title)
cgel
|
1 year ago
Yes. That is mostly the idea. But calling the state of a linear transformer KV cache is not quite right. A KV cache grows with the sequence length. But the linear transformer state just stores V @ K.T, an object with fixed size.
No comments yet.