top | item 46303386

(no title)

Obertr | 2 months ago

At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed

Image model they have released is much worse than nano banana pro, ghibli moment did not happen

Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding

The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.

discuss

avazhi|2 months ago

"At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed"

This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.

drawnwren|2 months ago

OAI also got talent mined. Their top intellectual leaders left after fight with sama, then Meta took a bunch of their mid-senior talent, and Google had the opposite. They brought Noam and Sergey back.

mmaunder|2 months ago

Yeah the only thing standing in Google's way is Google. And it's the easy stuff, like sensible billing models, easy to use docs and consoles that make sense and don't require 20 hours to learn/navigate, and then just the slew of bugs in Gemini CLI that are basic usability and model API interaction things. The only differentiator that OpenAI still has is polish.

Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.

Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.

Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.

Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.

GenerWork|2 months ago

I'm actually liking 5.2 in Codex. It's able to take my instructions, do a good job at planning out the implementation, and will ask me relevant questions around interactions and functionality. It also gives me more tokens than Claude for the same price. Now, I'm trying to white label something that I made in Figma so my use case is a lot different from the average person on this site, but so far it's my go to and I don't see any reason at this time to switch.

gpt5|2 months ago

I've noticed when it comes to evaluating AI models, most people simply don't ask difficult enough questions. So everything is good enough, and the preference comes down to speed and style.

It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.

int32_64|2 months ago

Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.

crazygringo|2 months ago

For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.

But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.

And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)

xbmcuser|2 months ago

Google biggest advantage over time will be costs. They have their own hardware which they can and will optimise for their LLMS. And Google has experience of getting market share over time by giving better results, performance or space. ie gmail vs hotmail/yahoo. Chrome vs IE/Firefox. So don't discount them if the quality is better they will get ahead over time.

rfw300|2 months ago

That might be true for a narrow definition of chatbots, but they aren't going to survive on name recognition if their models are inferior in the medium term. Right now, "agents" are only really useful for coding, but when they start to be adopted for more mainstream tasks, people will migrate to the tools that actually work first.

holler|2 months ago

this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.

fullstick|2 months ago

I doubt anyone I know who is using llms outside of work knows that there are benchmark tests for these models.

jay_kyburz|2 months ago

This is why both google and microsoft are pushing Gemini and Copilot in everyone's face.

dieortin|2 months ago

Is there anything pointing to Brin having anything to do with Google’s turnaround in AI? I hear a lot of people saying this, but no one explaining why they do

novok|2 months ago

In organizations, everyone's existence and position is politically supported by their internal peers around their level. Even google's & microsoft's current CEOs are supported by their group of co-executives and other key players. The fact that both have agreeable personalities is not a mistake! They both need to keep that balance to stay in power, and that means not destroying or disrupting your peer's current positions. Everything is effectively decided by informal committee.

Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.

This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".

HarHarVeryFunny|2 months ago

I would say it more goes back to the Google Brain + DeepMind merger, creating Google DeepMind headed by Demis Hassabis.

The merger happened in April 2023.

Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.

ryoshu|2 months ago

If he's having an impact it's because he can break through the bureaucracy. He's not trying to protect a fiefdom.

raincole|2 months ago

That's a quite sensationalized view.

Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?

Obertr|2 months ago

Check the size and budget of Google iniatives. It’s unlimited

BoredPositron|2 months ago

The Ghibli moment was an influencer fad not real advancement.

baq|2 months ago

GPT 5.2 is actually getting me better outputs than Opus 4.5 on very complex reviews (on high, I never use less) - but the speed makes Opus the default for 95% of use cases.

yieldcrv|2 months ago

the trend I've seen is that none of these companies are behind in concept and theory, they are just spending longer intervals baking a more superior foundational model

so they get lapped a few times and then drop a fantastic new model out of nowhere

the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc

they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options

JumpCrisscross|2 months ago

> I start to believe OAI is very much behind

Kara Swisher recently compared OpenAI to Netscape.

Andrex|2 months ago

Ouch.

Maybe we'll get some awesome FOSS tech out of its ashes?

louiereederson|2 months ago

i think the most important part of google vs openai is slowing usage of consumer LLMs. people focus on gemini's growth, but overall LLM MAUs and time spent is stabilizing. in aggregate it looks like a complete s-curve. you can kind of see it in the table in the link below but more obvious when you have the sensortower data for both MAUs and time spent.

the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.

https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...

aswegs8|2 months ago

Not sure why they just not replicate the workflow that nano banana pro uses. It lets the thinking model generate a detailed description and then renders that image. When I use ChatGPT thinking model and render an image I also get pretty good results. It's not as creative or flexible as nano banana pro, but it produces really useful results.

random9749832|2 months ago

This is obviously trained on Pro 3 outputs for benchmaxxing.

CuriouslyC|2 months ago

Not trained on pro, distilled from it.

NitpickLawyer|2 months ago

> for benchmaxxing.

Out of all the big4 labs, google is the last I'd suspect of benchmaxxing. Their models have generally underbenched and overdelivered in real world tasks, for me, ever since 2.5 pro came out.

encroach|2 months ago

OAI's latest image model outperforms Google's in LMArena in both image generation and image editing. So even though some people may prefer nano banana pro in their own anecdotal tests, the average person prefers GPT image 1.5 in blind evaluations.

https://lmarena.ai/leaderboard/text-to-image

https://lmarena.ai/leaderboard/image-edit

Obertr|2 months ago

Add This to Gemini distribution which is being adcertised by Google in all of their products, and average Joe will pick the sneakers at the shelf near the checkout rather than healthier option in the back

nightski|2 months ago

Google has incredible tech. The problem is and always has been their products. Not only are they generally designed to be anti-consumer, but they go out of their way to make it as hard as possible. The debacle with Antigravity exfiltrating data is just one of countless.

novok|2 months ago

The Antigravity case feels like a pure bug and them rushing to market. They had a bunch of other bugs showing that. That is not anti-consumer or making it difficult.