IMO, they're worth trying - they don't become completely braindead at Q2 or Q3, if it's a large enough model, apparently. (I've had surprisingly decent experience with Q2 quants of large-enough models. Is it as good as a Q4? No. But, hey - if you've got the bandwidth, download one and try it!)
Also, don't forget that Mixture of Experts (MoE) models perform better than you'd expect, because only a small part of the model is actually "active" - so e.g. a Qwen3-whatever-80B-A3B would be 80 billion total, but 3 billion active- worth trying if you've got enough system ram for the 80 billion, and enoguh vram for the 3.
Simply and utterly impossible to tell in any objective way without your own calibration data, in which case, make your own post trained quantized checkpoints anyway. That said, millions of people out there make technical decisions on vibes all the time, and has anything bad happened to them? I suppose if it feels good to run smaller quantizations, do it haha.
plagiarist|14 days ago
jncraton|14 days ago
AbstractGeo|14 days ago
Also, don't forget that Mixture of Experts (MoE) models perform better than you'd expect, because only a small part of the model is actually "active" - so e.g. a Qwen3-whatever-80B-A3B would be 80 billion total, but 3 billion active- worth trying if you've got enough system ram for the 80 billion, and enoguh vram for the 3.
doctorpangloss|14 days ago
unknown|14 days ago
[deleted]