top | item 35814734

(no title)

rapiz | 2 years ago

llama 7b in full precision requires 28GB of GPU ram. I know little about AI. Can I estimate that a 770 model takes ~2.8GB? It means vast devices can run the model locally.

discuss

thewataccount|2 years ago

Something like that, plus some overhead but likely < 3GB.

That being said though, half-precision would almost certainly work without a huge performance hit (so 2GB), 8bit/4bit quantitization could be done but usually don't work super well on the small parameter count models.

Also you can still batch it through but it'll be slower. In theory ~2GB including the overhead though...

It's worth noting that it's current performance though is only for a specific category of tests, and isn't the most useful model either so it's not chatGPT-on-your-phone just yet.

opyate|2 years ago

I would love if you could elaborate on this, or point me to resources where this is applied:

> Also you can still batch it through but it'll be slower.

Thanks!