Anyone else find that despite Gemini performing best on benches, it's actually still far worse than ChatGPT and Claude? It seems to hallucinate nonsense far more frequently than any of the others. Feels like Google just bench maxes all day every day. As for Mistral, hopefully OSS can eat all of their lunch soon enough.
apexalpha|2 months ago
Granted, this is a subject that is very well present in the training data but still.
Synthetic7346|2 months ago
mvkel|2 months ago
cmrdporcupine|2 months ago
I feel we're only a year or two away from hitting a plateau with the frontier closed models having diminishing returns vs what's "open"
barrell|2 months ago
re-thc|2 months ago
Do things ever work that way? What if Google did Open source Gemini. Would you say the same? You never know. There's never "supposed" and "purpose" like that.
pants2|2 months ago
Unfortunately that doesn't pay the electricity bill
dchest|2 months ago
mrtksn|2 months ago
cmrdporcupine|2 months ago
Frankly, I don't actually care about or want "general intelligence" -- I want it to make good code, follow instructions, and find bugs. Gemini wasn't bad at the last bit, but wasn't great at the others.
They're all trying to make general purpose AI, but I just want really smart augmentation / tools.
tootie|2 months ago
llm_nerd|2 months ago
In prior posts you oddly attack "Palantir-partnered Anthropic" as well.
Are things that grim at OpenAI that this sort of FUD is necessary? I mean, I know they're doing the whole code red thing, but I guarantee that posting nonsense like this on HN isn't the way.
cmrdporcupine|2 months ago
It's also slower than both Opus 4.5 and Sonnet.
bluecalm|2 months ago
minimaxir|2 months ago
alfalfasprout|2 months ago
gunalx|2 months ago
moffkalast|2 months ago
Trust no one, test your use case yourself is pretty much the only approach, because people either don't run benchmarks correctly or have the incentive not to.
VeejayRampay|2 months ago