(no title)
mchiang | 4 months ago
May I ask what system you are using where you are getting memory estimations wrong? This is an area Ollama has been working on and improved quite a bit on.
Latest version of Ollama is 0.12.5 and with a pre-release of 0.12.6
0.7.1 is 28 versions behind.
thot_experiment|4 months ago
as for 4chan, they've hated ollama for a long time because they built on top of llama.cpp and then didn't contribute upstream or give credit to the original project
mchiang|4 months ago
To help future optimizations for given quantizations, we have been trying to limit the quantizations to ones that fit for majority of users.
In the case of mistral-small3.1, Ollama supports ~4bit (q4_k_m), ~8bit (q8_0) and fp16.
https://ollama.com/library/mistral-small3.1/tags
I'm hopeful that in the future, more and more model providers will help optimize for given model quantizations - 4 bit (i.e. NVFP4, MXFP4), 8 bit, and a 'full' model.