I've played with this a bit and it's ok. I'd place it somewhere around sonnet 4.5 level, probably below. But with this aggressive pricing you can just run 3 copies to do the same thing, choose the one that succeeded and still come out way ahead with the cost. Not as great as following instructions as Claude models and can get lost, but still "good enough".
I'm very happy with using it to just "do things". When doing in depth debugging or a massive plan is needed, I'd go with something better, but later going through the motions? It works.
Would it kill them to use the words "AI coding agent" somewhere prominent?
"MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks" could be an IDE, a UI framework, a performance library, or, or...
its main Chinese competitor GLM is like making 50 cents USD each in the past 6 months from its 40 million "developer users", calling your flagship model "AI coding agent" is like telling investors "we are doing this for fun, not for money".
I think people should stop comparing to sonnet, but to opus instead since it's so far ahead on producing code I would actually want to use (gemini 3 pro tends to be lacking in generalization and wants things to be using it's own style rather than adapting).
Whatever benchmark opus is ahead in should be treated as a very important metric of proper generalization in models.
I generally prefer Sonnet as comparison too. Opus, as good as it is, is just too expensive. The "best" model is the one I can use, not the one I can't afford.
These days, by default I just use Sonnet/Haiku. In most cases it's more than good enough for me. It's plenty with $20 plan.
With MiniMax, or GLM-4.7, some people like me are just looking for Sonnet level capability at much cheaper price.
> MiniMax has been continuously transforming itself in a more AI-native way. The core driving forces of this process are models, Agent scaffolding, and organization. Throughout the exploration process, we have gained increasingly deeper understanding of these three aspects. Today we are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI-native ways of working (and living) sooner.
This compresses to: “We are updating our model, MiniMax, to 2.1. Agent harnesses exist and Agents are getting more capable.”
A good model and agent harness, pointed at the task of writing this post, might suggest less verbosity and complexity— it comes off as fake and hype-chasing to me, even if your model is actually good. I disengage there.
I saw yall give a lightning talk recently and it was similarly hype-y. Perhaps this is a translation or cultural thing.
so when MiniMax released a pretty capable model, you choose to ignore the model itself and just focus a single sentence they wrote in the release note and started bad mouthing it.
Very anecdotal but for me this model has very weak prompt adherence. I compared it a tiny bit to gemini flash 3.0 and simple things like "don't use markdown tables in output" was very hard to get with m2.1
Took me like 5 prompt iterations until it finally listened.
But it's very good, better than flash 3.0 in terms of code output and reasoning while being cheaper.
Has anyone used this in earnest with something like OpenCode? Over the past few months I’ve tested a dozen models that were claimed to be nearly as good Claude Code or Codex, but the overall experience when using them with OpenCode was close to abysmal. Not even a single one was able to do a decent code editing job on a real-world codebase.
With M2, yes - I’ve used it in Claude Code (e.g. native tool calling), Roo/Cline (e.g. custom tool parsing), etc. It’s quite good and for some time the best model to self-host. At 4bit it can fit on 2x RTX 6000 Pro (e.g. ~200GB VRAM) with about 400k context at fp8 kv cache. It’s very fast due to low active params, stable at long context, quite capable in any agent harness (its training specialty). M2.1 should be a good bump beyond M2, which was undertrained relative to even much smaller models.
How is everyone monitoring the skill/utility of all these different models? I am overwhelmed by how many they are, and the challenge of monitoring their capability across so many different modalities.
> It exhibits consistent and stable results in tools such as Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox, while providing reliable support for Context Management mechanisms including Skill.md, Claude.md/agent.md/cursorrule, and Slash Commands.
One of the demos shows them using Claude Code, which is interesting. And the next sections are titled 'Digital Employee' and 'End-to-End Office Automation'. Their ambitions obviously go beyond coding. A sign of things to come...
Claude doesn't officially support using other, non-Anthropic models, right? So did they patch the code or fake the Claude API, or some other hack to get around that?
I used gemini-3-pro-preview on Deepwalker [0]. It was good, then switched to gemini-3-flash, It's ok. It gets the job done. Looking for some alternatives such as GLM and Minimax. Very curious about their agentic performance. Like long running tasks with reasoning.
I’ve spent a little bit of time testing Minimax M2. It’s quite good given the small size but it did make some odd mistakes and struggle with precise instructions.
can you please fix the login, when I try to log in, it says
Unable to process request due to missing initial state. This may happen if browser sessionStorage is inaccessible or accidentally cleared. Some specific scenarios are - 1) Using IDP-Initiated SAML SSO. 2) Using signInWithRedirect in a storage-partitioned browser environment.
That they are still training models against Objective-C is all the proof you need that it will outlive Swift.
When is someone going to vibe code Objective-C 3.0? Borrowing all of the actual good things that have happened since 2.0 is closer than you'd think thanks to LLVM and friends.
Why would they not? Existing objective-c apps will still need updates and various work. Models are still trained on assembler for architectures that don't meaningfully exist today as well.
viraptor|2 months ago
I'm very happy with using it to just "do things". When doing in depth debugging or a massive plan is needed, I'd go with something better, but later going through the motions? It works.
unknown|2 months ago
[deleted]
gcanyon|2 months ago
"MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks" could be an IDE, a UI framework, a performance library, or, or...
spoaceman7777|2 months ago
tw1984|2 months ago
Tepix|2 months ago
https://huggingface.co/MiniMaxAI/MiniMax-M2.1
kachapopopow|2 months ago
Whatever benchmark opus is ahead in should be treated as a very important metric of proper generalization in models.
azuanrb|2 months ago
These days, by default I just use Sonnet/Haiku. In most cases it's more than good enough for me. It's plenty with $20 plan.
With MiniMax, or GLM-4.7, some people like me are just looking for Sonnet level capability at much cheaper price.
jondwillis|2 months ago
This compresses to: “We are updating our model, MiniMax, to 2.1. Agent harnesses exist and Agents are getting more capable.”
A good model and agent harness, pointed at the task of writing this post, might suggest less verbosity and complexity— it comes off as fake and hype-chasing to me, even if your model is actually good. I disengage there.
I saw yall give a lightning talk recently and it was similarly hype-y. Perhaps this is a translation or cultural thing.
tw1984|2 months ago
is it a cultural thing?
zaptrem|2 months ago
tomcam|2 months ago
esafak|2 months ago
yinuoli|2 months ago
prmph|2 months ago
tucnak|2 months ago
dist-epoch|2 months ago
gempir|2 months ago
Took me like 5 prompt iterations until it finally listened.
But it's very good, better than flash 3.0 in terms of code output and reasoning while being cheaper.
p5v|2 months ago
t1amat|2 months ago
Invictus0|2 months ago
redman25|2 months ago
https://swe-rebench.com
https://livebench.ai/#/
https://eqbench.com/#
https://contextarena.ai/?needles=8
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
https://artificialanalysis.ai/leaderboards/models
https://gorilla.cs.berkeley.edu/leaderboard.html
https://github.com/lechmazur/confabulations
https://dubesor.de/benchtable
https://help.kagi.com/kagi/ai/llm-benchmark.html
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
spoaceman7777|2 months ago
It's nice and simple in the overview mode though. Breaks it down into an intelligence ranking, a coding ranking, and an agentic ranking.
https://artificialanalysis.ai/
esafak|2 months ago
One of the demos shows them using Claude Code, which is interesting. And the next sections are titled 'Digital Employee' and 'End-to-End Office Automation'. Their ambitions obviously go beyond coding. A sign of things to come...
atombender|2 months ago
jimmydoe|2 months ago
m00dy|2 months ago
[0]: https://deepwalker.xyz
sosodev|2 months ago
viraptor|2 months ago
stpedgwdgfhgdd|2 months ago
01-_-|2 months ago
big-chungus4|2 months ago
jdright|2 months ago
big-chungus4|2 months ago
mr_o47|2 months ago
integricho|2 months ago
Tepix|2 months ago
jedisct1|2 months ago
sillyboi|2 months ago
p-e-w|2 months ago
“We're excited for powerful open-source models like M2.1 […]”
Yet as far as I can tell, this model isn’t open at all. Not even open weights, nevermind open source.
viraptor|2 months ago
NitpickLawyer|2 months ago
https://huggingface.co/MiniMaxAI/MiniMax-M2.1
bearjaws|2 months ago
boredemployee|2 months ago
erdemo|2 months ago
Yash16|2 months ago
[deleted]
GavinNewsom|2 months ago
[deleted]
maximgeorge|2 months ago
[deleted]
monster_truck|2 months ago
When is someone going to vibe code Objective-C 3.0? Borrowing all of the actual good things that have happened since 2.0 is closer than you'd think thanks to LLVM and friends.
viraptor|2 months ago
victorbjorklund|2 months ago