(no title)
mchiang | 4 months ago
To help future optimizations for given quantizations, we have been trying to limit the quantizations to ones that fit for majority of users.
In the case of mistral-small3.1, Ollama supports ~4bit (q4_k_m), ~8bit (q8_0) and fp16.
https://ollama.com/library/mistral-small3.1/tags
I'm hopeful that in the future, more and more model providers will help optimize for given model quantizations - 4 bit (i.e. NVFP4, MXFP4), 8 bit, and a 'full' model.
thot_experiment|4 months ago
I truly don't understand the reasoning behind removing support for all the other quants, it's really baffling to me considering how much more useful running a 70b parameter at q3 is that not being able to run a 70b parameter model at all, etc. Not to mention forcing me to download hundreds of gigabytes of fp16 because compatibility with other quants is apparently broken, and forcing me to quant models myself.