top | item 45708029

(no title)

ramanvarma | 4 months ago

skimmed the paper - how well does this plug into real serving stacks (paged-kv, vllm, speculative decoding, caching)? layer-wise top-k chunk voting sounds compatible, but does it fight with RoPE scaling or sliding-window kv eviction policies?

discuss

order

No comments yet.