top | item 45708029 (no title) ramanvarma | 4 months ago skimmed the paper - how well does this plug into real serving stacks (paged-kv, vllm, speculative decoding, caching)? layer-wise top-k chunk voting sounds compatible, but does it fight with RoPE scaling or sliding-window kv eviction policies? discuss order hn newest No comments yet.
No comments yet.