top | item 45249253

(no title)

cpldcpu | 5 months ago

The dimensions should actually be closer to 12000 * (no of tokens*no of layers / x)

(where x is a number dependent on architectural features like MLHA, QGA...)

There is this thing called KV cache which holds an enormous latent state.

discuss

order

No comments yet.