top | item 44112597 (no title) mmoskal | 9 months ago The sglang and vllm numbers are with cuda graphs enabled.Having said that, 1B model is an extreme example - hence the 1.5x speedup. For regular models and batch sizes this would probably buy you a few percent. discuss order hn newest No comments yet.
No comments yet.