top | item 44295111

(no title)

robotbikes | 8 months ago

This looks like a good resource. There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3. Ollama makes it simple to run them on your own hardware, but the cost of the GPU is a significant investment. But if you are paying $250 a month for a proprietary tool it would pay for itself pretty quickly.

discuss

NitpickLawyer|8 months ago

> There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3.

I'd caution against using devstral on a 24 gb vram budget. Heavy quantisation (the only way to make it fit into 24gb) will affect it a lot. Lots of reports on locallama about subpar results, especially from kv cache quant.

We've had good experiences with running it fp8 and full cache, but going lower than that will impact the quality a lot.

seanmcdirmid|8 months ago

A Max M3 with 64 GB works well for a wider range of models although it fairs worse on stable diffusion jobs. Plus you can get it as a laptop.