top | item 35978155

(no title)

joaogante | 2 years ago

A 3090 (or any GPU with >=20GB VRAM) can run StarCoder with int8 quantization at about 12 tokens per second, 33 with assisted generation -- which will come out for StarCoder in the coming days.

When 4-bit quantization comes out, I would expect a GPU with 12GB VRAM to be able to run it.

Disclaimer: I work at Hugging Face

discuss

order

No comments yet.