top | item 41339984 (no title) iAkashPaul | 1 year ago Pretty sure this was never questioned for batched requests, sg-lang/lmdeploy/tensorRT-LLM will have nearly twice as reported speeds with INT8 (fp16 A100 benched here https://github.com/sgl-project/sglang?tab=readme-ov-file#ben...) discuss order hn newest No comments yet.
No comments yet.