In addition to the tools other people responded with, a good rule of thumb is that most local models work best* at q4 quants, meaning the memory for the model is a little over half the number of parameters, e.g. a 14b model may be 8gb. Add some more for context and maybe you want 10gb VRAM for a 14gb model. That will at least put you in the right ballpark for what models to consider for your hardware.
(*best performance/size ratio, generally if the model easily fits at q4 you're better off going to a higher parameter count than going for a larger quant, and vice versa)
svachalek|7 months ago
(*best performance/size ratio, generally if the model easily fits at q4 you're better off going to a higher parameter count than going for a larger quant, and vice versa)
nottorp|7 months ago
... or if you have Apple hardware with their unified memory, whatever the assholes soldered in is your limit.
CharlesW|7 months ago
LM Studio (not exclusively, I'm sure) makes it a no-brainer to pick models that'll work on your hardware.
GaggiX|7 months ago
This one is very good in my opinion.
jxf|7 months ago
qingcharles|7 months ago
https://www.reddit.com/r/LocalLLaMA/
knowaveragejoe|7 months ago