In all of these posts there is someone claiming Claude is the best, then somebody else claiming they have tried a bunch of times and for them Gemini is the best while others find GPT-5 is supreme. Obviously, all of these are subjective narrow experiences. My conclusion is that all frontier models are both good and bad with no clear winner and making good evals is really hard.
SkyPuncher|5 months ago
* Gemini has the highest ceiling out of all of the models, but has consistently struggled with token-level accuracy. In other words, it's conceptual thinking it well beyond other models, but it sometimes makes stupid errors when talking. This makes it hard to reliably use for tool calling or structured output. Gemini is also very hard to steer, so when it's wrong, it's really hard to correct.
* Claude is extremely consistent and reliable. It's very, very good at the details - but will start to forget things if things get too complex. The good news is Claude is very steerable and will remember those details if you remind it.
* GPT-5 seems to be completely random for me. It's so inconsistent that it's extremely hard to use.
I tend to use Claude because I'm the most familiar with it and I'm confident that I can get good results out of it.
artdigital|5 months ago
It’s honestly crazy how good it is, coming from Claude. I never thought I could already pass something a design doc and have it one-shot the entire thing with such level of accuracy. Even with Opus, I always need to either steer it, or fix the stuff it forgot by hand / have another phase afterwards to get it from 90% to 100%.
Yes the Codex TUI sucks but the model with high reasoning is an absolute beast, and convinced me to switch from Claude Max to ChatGPT Pro
Workaccount2|5 months ago
It's really the only model that can do large(er) codebase work.
bcrosby95|5 months ago
Alex-Programs|5 months ago
Keyframe|5 months ago
I run claude CLI as a primary and just ask it nicely to consult gemini cli (but not let it do any coding). It works surprisingly well. OpenAI just fell out of my view. Even cancelled ChatGPT subscription. Gemini is leaping forward and _feels like_ ChatGPT-5 is a regression.. I can't put my finger on it tbh.
qaq|5 months ago
Robdel12|5 months ago
jiggawatts|5 months ago
binary132|5 months ago
smoe|5 months ago
One advantage Gemini had (or still has, I’m not sure about the other providers) was its large context window combined with the ability to use PDF documents. It probably saved me weeks of work on an integration with a government system uploading hundreds of pages of documentation and immediately start asking questions, generating rules, and troubleshooting payloads that were leading to generic, computer-says-no errors.
No need to go trough RAG shenanigans and all of it within the free token allowance.
mlsu|5 months ago
It's like the personality of a person. Employee A is better at talking to customers than Employee B, but Employee B is better at writing code than Employee A. Is one better than the other? Is one smarter than the other? Nope. Different training data.