top | item 41280381

(no title)

cgel | 1 year ago

Yes. That is mostly the idea. But calling the state of a linear transformer KV cache is not quite right. A KV cache grows with the sequence length. But the linear transformer state just stores V @ K.T, an object with fixed size.

discuss

order

No comments yet.