Nvidia hardware is cheap as chips right now. If you got 2x 3060 12gb cards (or a 24gb 4090), you'd have 24gb of CUDA-accelerated VRAM to play with for inference and finetuning. It should be plenty to fit the smaller SOTA models like GLM-4.5 Air, Qwen3 30b A3B, and Llama Scout, and definitely enough to start layering the giant 100b+ parameter options.
bigyabai|7 months ago
Nvidia hardware is cheap as chips right now. If you got 2x 3060 12gb cards (or a 24gb 4090), you'd have 24gb of CUDA-accelerated VRAM to play with for inference and finetuning. It should be plenty to fit the smaller SOTA models like GLM-4.5 Air, Qwen3 30b A3B, and Llama Scout, and definitely enough to start layering the giant 100b+ parameter options.
That's what I'd get, at least.
vmt-man|7 months ago
Are they good enough compared to Sonnet 4?
I’ve also used Gemini 2.5 Pro and Flash, and they’re worse. But they’re much bigger, not just 30B.