(no title)
joshhart | 1 year ago
Yes you can. The community creates quantized variants of these that can run on consumer GPUs. A 4-bit quantization of LLAMA 70b works pretty well on Macbook pros, the neural engine with unified CPU memory is quite solid for these. GPUs is a bit tougher because consumer GPU RAM is still kinda small.
You can also fine-tune them. There are lot of frameworks like unsloth that make this easier. https://github.com/unslothai/unsloth . Fine-tuning can be pretty tricky to get right, you need to be aware of things like learning rates, but there are good resources on the internet where a lot of hobbyists have gotten things working. You do not need a PhD in ML to accomplish this. You will, however, need data that you can represent textually.
Source: Director of Engineering for model serving at Databricks.
unknown|1 year ago
[deleted]
vtail|1 year ago
jwitthuhn|1 year ago
70B model at 8 bits per parameter would mean 70GB, 4 bits is 35GB, etc. But that is just for the raw weights, you also need some ram to store the data that is passing through the model and the OS eats up some, so add about a 10-15% buffer on top of that to make sure you're good.
Also the quality falls off pretty quick once you start quantizing below 4-bit so be careful with that, but at 3-bit a 70B model should run fine on 32GB of ram.
Filligree|1 year ago
aiden3|1 year ago
nickpsecurity|1 year ago
Since I follow Christ, I can’t break the law or use what might be produced directly from infringement. I might be able to do more experiments if a free, legal model is available. Also, we can legally copy datasets like PG19 since they’re public domain. Whereas, most others have works in which I might need a license to distribute.
Please forward the request to the model trainers. Even a 7B model would let us do a lot of research on optimization algorithms, fine-tuning, etc.
evilduck|1 year ago
profsummergig|1 year ago