(no title)
Obertr | 2 months ago
Image model they have released is much worse than nano banana pro, ghibli moment did not happen
Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding
The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.
avazhi|2 months ago
This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.
drawnwren|2 months ago
mmaunder|2 months ago
Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.
Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.
Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.
Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.
GenerWork|2 months ago
gpt5|2 months ago
It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.
int32_64|2 months ago
crazygringo|2 months ago
But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.
And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)
xbmcuser|2 months ago
rfw300|2 months ago
holler|2 months ago
fullstick|2 months ago
jay_kyburz|2 months ago
dieortin|2 months ago
novok|2 months ago
Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.
This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".
HarHarVeryFunny|2 months ago
The merger happened in April 2023.
Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.
ryoshu|2 months ago
raincole|2 months ago
Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?
Obertr|2 months ago
BoredPositron|2 months ago
baq|2 months ago
yieldcrv|2 months ago
so they get lapped a few times and then drop a fantastic new model out of nowhere
the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc
they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options
JumpCrisscross|2 months ago
Kara Swisher recently compared OpenAI to Netscape.
Andrex|2 months ago
Maybe we'll get some awesome FOSS tech out of its ashes?
louiereederson|2 months ago
the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.
https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...
aswegs8|2 months ago
random9749832|2 months ago
CuriouslyC|2 months ago
NitpickLawyer|2 months ago
Out of all the big4 labs, google is the last I'd suspect of benchmaxxing. Their models have generally underbenched and overdelivered in real world tasks, for me, ever since 2.5 pro came out.
encroach|2 months ago
https://lmarena.ai/leaderboard/text-to-image
https://lmarena.ai/leaderboard/image-edit
Obertr|2 months ago
nightski|2 months ago
novok|2 months ago