WingNews logo WingNews
top | new | best | ask | show | jobs
top | item 41339984

(no title)

iAkashPaul | 1 year ago

Pretty sure this was never questioned for batched requests, sg-lang/lmdeploy/tensorRT-LLM will have nearly twice as reported speeds with INT8 (fp16 A100 benched here https://github.com/sgl-project/sglang?tab=readme-ov-file#ben...)

discuss

order

No comments yet.

powered by hn/api // news.ycombinator.com