top | item 36411268 (no title) george_123 | 2 years ago this approach to managing KV cache can work with 4bit. imagine the speedup of pagedattention with quantization.. discuss order hn newest zhisbug|2 years ago yep, it is agonistic to 4-bit. You can deploy a 4-bit model and still use vllm + pagedattention to double or even triple your serving throughput. ynniv|2 years ago If this were submitted as a new comment it would be at the top of the page. baobabKoodaa|2 years ago You mean like, theoretically, in the future? Or you mean today? ipsum2|2 years ago probably mean agnostic, agonistic implies the opposite. load replies (1)
zhisbug|2 years ago yep, it is agonistic to 4-bit. You can deploy a 4-bit model and still use vllm + pagedattention to double or even triple your serving throughput. ynniv|2 years ago If this were submitted as a new comment it would be at the top of the page. baobabKoodaa|2 years ago You mean like, theoretically, in the future? Or you mean today? ipsum2|2 years ago probably mean agnostic, agonistic implies the opposite. load replies (1)
zhisbug|2 years ago
ynniv|2 years ago
baobabKoodaa|2 years ago
ipsum2|2 years ago