I've read several people say that Kimi K2 has a better "emotional intelligence" than other models. I'll be interested to see whether K2.5 continues or even improves on that.
Yup, I experience the same. I don't know what they do to achieve this but it gives them this edge, really curious to learn more about what makes it so good at it.
A lot of people point to the Muon optimizer that Moonshot (the creators of Kimi) pioneered. Compared to the standard optimizer AdamW, Muon amplifies low-magnitude gradient directions which makes the model learn faster (and maybe gives Kimi its unique qualities).
flexagoon|1 month ago
Alifatisk|1 month ago
in-silico|1 month ago
Muon paper: https://arxiv.org/abs/2502.16982
storystarling|1 month ago
mohsen1|1 month ago