For coding you want more precision so the higher the quant the better.
But there is discussion if a smaller model in higher quant is better than a larger one in lower quant. Need to test for yourself with your use cases I'm afraid.
e: They did announce smaller variants will be released.
I can say that this really works great, I'm a heavy user of the unsloth dyanmic quants. I run DeepSeek v3/r1 in Q3, and ernie-300b and KimiK2 in Q3 too. Amazing performance. I run Qwen3-235b in both Q4 and Q8 and can barely tell the difference so much so that I just keep Q4 since it's twice as fast.
danielhanchen|7 months ago
segmondy|7 months ago