(no title)
opk
|
1 year ago
Has anyone actually got this llama stuff to be usable on even moderate hardware? I find it just crashes because it doesn't find enough RAM. I've got 2G of VRAM on an AMD graphics card and 16G of system RAM and that doesn't seem to be enough. The impression I got from reading up was that it worked for most Apple stuff because the memory is unified and other than that, you need very expensive Nvidia GPUs with lots of VRAM. Are there any affordable options?
horsawlarway|1 year ago
I can run 2b-14b models just fine on the CPU on my laptop (framework 13 with 32gb ram). They aren't super fast, and the 14b models have limited context length unless I run a quantized version, but they run.
If you just want generation and it doesn't need to be fast... drop the $200 for 128gb of system ram, and you can run the vast majority of the available models (up to ~70b quantized). Note - it won't be quick (expect 1-2 tokens/second, sometimes less).
If you want something faster in the "low end" range still - look at picking up a pair of Nvidia p40s (~$400) which will give you 16gb of ram and be faster for 2b to 7b models.
If you want to hit my level for "moderate", I use 2x3090 (I bought refurbed for ~$1600 a couple years ago) and they do quite a bit of work. Ex - I get ~15t/s generation for 70b 4 quant models, and 50-100t/s for 7b models. That's plenty usable for basically everything I want to run at home. They're faster than the m2 pro I was issued for work, and a good chunk cheaper (the m2 was in the 3k range).
That said - the m1/m2 macs are generally pretty zippy here, I was quite surprised at how well they perform.
Some folks claim to have success with the k80s, but I haven't tried and while 24g vram for under $100 seems nice (even if it's slow), the linux compatibility issues make me inclined to just go for the p40s right now.
I run some tasks on much older hardware (ex - willow inference runs on an old 4gb gtx 970 just fine)
So again - I'm not really sure we'd agree on moderate (I generally spend ~$1000 every 4-6 years to build a machine to play games, and the machine you're describing would match the specs for a machine I would have built 12+ years ago)
But you just need literal memory. bumping to 32gb of system ram would unlock a lot of stuff for you (at low speeds) and costs $50. Bumping to 124gb only costs a couple hundred, and lets you run basically all of them (again - slowly).
zamadatix|1 year ago
If "moderate hardware" is your average office PC then it's unlikely to be very usable. Anyone with a gaming GPU from the last several years should be workable though.
horsawlarway|1 year ago
It'd been a minute since I checked refurb prices and $250 for the rtx 3060 12gb is a good price.
Easier on the rest of the system than a 2x card setup, and is probably a drop in replacement.
bhelkey|1 year ago
[1] https://news.ycombinator.com/item?id=42069453
basilgohar|1 year ago
Larger models work but slow down. I do have 64GB of RAM but I think 32 could work. 16GB is pushing is, but should be possible if you don't have anything else open.
Memory requirements depend on numerous factors. 2GB VRAM is not enough for most GenAI stuff today.
whimsicalism|1 year ago