top | item 44801393

(no title)

artembugara | 6 months ago

oh, I totally understand that I'd need multiple GPUs. I'd just want to know what GPU specifically and how many

discuss

order

Tostino|6 months ago

I don't think you can get 1k tokens/sec on a single stream using any consumer grade GPUs with a 20b model. Maybe you could with H100 or better, but I somewhat doubt that.

My 2x 3090 setup will get me ~6-10 streams of ~20-40 tokens/sec (generation) ~700-1000 tokens/sec (input) with a 32b dense model.