I really like Gemma 3. Some quantized version of the 27B will be good enough for a lot of things. You can also take some abliterated version[0] with zero (like zero zero) guardrails and make it write you a very interesting crime story without having to deal with the infamous "sorry but I'm a friendly and safe model and cannot do that and also think about the children" response.
Qwen3 and some of the smaller gemma's are pretty good and fast. I have a gist with my benchmark #'s here on my m4 pro max (with a whole ton of ram, but most small models will fit on a well spec'ed dev mac.)
patates|10 months ago
[0]: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated
estsauver|10 months ago
https://gist.github.com/estsauver/a70c929398479f3166f3d69bce...