(no title)
borzunov | 2 years ago
So, I believe 1 sec/token with Petals is the best you can get for the models of this size, unless you have enough GPUs to fit the entire model into the GPU memory (you'd need 3x A100 or 8x 3090 for the 8-bit quantized model).
No comments yet.