top | item 47161250

(no title)

sinenomine | 4 days ago

People underestimate the lead OAI has with their post-5.2 models. The author does not strike me as someone who closely follows the progress frontier labs make in US and around the world.

discuss

order

energy123|4 days ago

It's a joint ignorance of how these frontier models get baked and what consumers want.

Many pundits think it's just a matter of scraping the internet and having a few ML scientists run ablation experiments to tune hyperparameters. That hasn't been true for over a year. The current requirements are more org-scale, more payoff from scale, more moat. The main legitimate competitive threat is adversarial distillation.

Many pundits also think that consumers don't want to pay a premium for small differences on the margin. That is very wrong-headed. I pay $200/month to a frontier lab because, even though it's only a few % higher in benchmark scores, it is 5x more useful on the margin.

svnt|4 days ago

It is the benchmark error rate, not the benchmark success %, that we actually trip up on.

Going from 85% to 90% is possibly 1/3 fewer errors or even higher, depending on the distribution of work you’re doing.

lelanthran|4 days ago

> The current requirements are more org-scale, more payoff from scale, more moat.

What moat? None of the AI providers have a moat at the moment, and the trend doesn't indicate that any of them will in the near future.

nick32661123|4 days ago

You pay to OpenAI or which one do you use? Do you switch regularly?

PunchTornado|4 days ago

Funny. What lead? Gemini and Claude are much better.

nextlevelwizard|4 days ago

Yeah. I also do not see the lead. Claude Opus writes better code and for conversation all models even Le Chat is just better than ChatGPT currently.

hyperbovine|4 days ago

Agreed, compare the frontier models from Google and OAI. It’s like night and day. Anyone who says “the tech has caught up” has not spent even one day using Gemini 3.1 to try and accomplish something complicated.

PunchTornado|4 days ago

I think the vast majority of coding us done through claude and gemini