Radeon R9700 with 32 GB VRAM is relatively affordable for the amount of RAM and with llama.cpp it runs fast enough for most things. These are workstation cards with blower fans and they are LOUD. Otherwise if you have the money to burn get a 5090 for speeeed and relatively low noise, especially if you limit power usage.
I have a pair of Radeon AI PRO R9700 with 32Gb, and so far they have been a pleasure to use. Drivers work out-of-the-box, and they are completely quiet when unused. They are capped at 300W power, so even at 100% utilization they are not too loud.
I was thinking about adding after-market liquid cooling for them, but they're fine without it.
It depends. How much are you willing to wait for an answer? Also, how far are you willing to push quantization, given the risk of degraded answers at more extreme quantization levels?
It's less than you'd think. I'm using the 35B-A3B model on an A5000, which is something like a slightly faster 3080 with 24GB VRAM. I'm able to fit the entire Q4 model in memory with 128K context (and I think I would probably be able to do 256K since I still have like 4GB of VRAM free). The prompt processing is something like 1K tokens/second and generates around 100 tokens/second. Plenty fast for agentic use via Opencode.
I've had an AMD card for the last 5 years, so I kinda just tuned out of local LLM releases because AMD seemed to abandon rocm for my card (6900xt) - Is AMD capable of anything these days?
I think the 27B dense model at full precision and 122B MoE at 4- or 6-bit quantization are legitimate killer apps for the 96 GB RTX 6000 Pro Blackwell, if the budget supports it.
I imagine any 24 GB card can run the lower quants at a reasonable rate, though, and those are still very good models.
Big fan of Qwen 3.5. It actually delivers on some of the hype that the previous wave of open models never lived up to.
suprjami|1 day ago
If you want to spend twice as much for more speed, get a 3090/4090/5090.
If you want long context, get two of them.
If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM.
barrkel|1 day ago
chr15m|23 hours ago
dajonker|1 day ago
cyberax|1 day ago
I was thinking about adding after-market liquid cooling for them, but they're fine without it.
andsoitis|1 day ago
Check out the HP Omen 45L Max: https://www.hp.com/us-en/shop/pdp/omen-max-45l-gaming-dt-gt2...
laweijfmvo|1 day ago
zozbot234|1 day ago
throwdbaaway|23 hours ago
xienze|1 day ago
rahimnathwani|1 day ago
I'm curious which one you're using.
msuniverse2026|1 day ago
elorant|1 day ago
CamperBob2|1 day ago
I imagine any 24 GB card can run the lower quants at a reasonable rate, though, and those are still very good models.
Big fan of Qwen 3.5. It actually delivers on some of the hype that the previous wave of open models never lived up to.
MarsIronPI|1 day ago