top | item 44674108

(no title)

They can call it whatever they want…not sure that has a great deal of meaning unless there’s a GPT-4/Claude 3.5 level step change.

discuss

jug|7 months ago

I doubt it will have. OpenAI planned to release GPT-5 in 2024 or early 2025, it underwhelmed, and anonymous OpenAI sources have claimed that the later GPT-4.5 was actually GPT-5 relabelled to set expectations. It was seen as roughly a 20% improvement over GPT-4o. This is when it sunk in for OpenAI that they were at the end of the road for non-reasoning models. Scaling issues made them too costly.

Turning to their reasoning models, it’s also known and documented through SimpleQA and PersonQA that OpenAI o3 hallucinates more than o1, and o4-mini even more than o3. There’s an unmanaged issue where training on synthetic data improves benchmark results on STEM tasks but increases hallucination rates, especially troubling OpenAI models for some reason (my guess: they’re fine-tuned to take risks since it’s known to also increase likelihood of getting it right for hard tasks?)

Google has long known OpenAI struggles with hallucinations more than them according to an anonymous Googler that I saw commented on this. This has been verified by the aforementioned benchmarks. Anthropic also struggles less. But as far as I can tell, they’re all facing issues with synthetic data acting like a double edged sword.

So GPT-5 is going to be interesting. How well it exactly does will bear a lot of meaning for the kind of trouble OpenAI is in right now. Maybe OpenAI has found a novel approach in reducing hallucinations? I think that’s among their most crucial points right now. But other than this, no, I don’t expect a revolution, only an evolution. They might currently win benchmarks, but it will hardly be something that catapults them.

If GPT-5 underwhelms, it will bear a stronger signal than merely the one that GPT-5 underwhelms. Because then OpenAI has trouble with both non-reasoning and reasoning models, and we’re likely to be looking at the end of the road on the horizon for current GPT based LLM’s and one where the winner will probably ultimately be cheaper open weight models once they catch up.

j_timberlake|7 months ago

This is the smart take. The version numbers were useful going from GPT-3 to GPT-3.5 and then GPT-4, but after that OpenAI butchered the usefulness of the naming scheme, with stuff like o4 and 4-o. Version numbers now tell you almost nothing about the change.

This happened because training progressively larger models used to be the main path forward, which was easy to track and name, but currently it's all about quickly incorporating synthetic data chain-of-thoughts created by flash models.

jobs_throwaway|7 months ago

Meh. The models are already quite useful. If the improvement is half of what the jump to GPT-4 was, it will be a big deal.

himeexcelanta|7 months ago

Agree on the usefulness of models. They still require a lot of babysitting for software development. We’re seeing marginal improvements, but the aggregate utility still adds up over time. Am just skeptical of OpenAI at this point for most things.