Evidence that LLMs are reaching a point of diminishing returns

[+] thorum|1 year ago|reply

> And here’s the thing – we all know that GPT-3 was vastly better than GPT-2. And we all know that GPT-4 (released thirteen months ago) was vastly better than GPT-3. But what has happened since?

Other models have gradually caught up to GPT-4 while OpenAI has declined to discuss or release anything regarding their internal research and work on GPT-5. Given how far ahead they were with GPT-4, it seems odd to assume no further progress has been made and just not published yet.

The author’s assumption is based on the claim that GPT-4 Turbo “was a failed attempt at GPT-5” but I don’t see any good reason to believe this unless you take the claim that GPT-5 is impossible as a given and reason backwards.

[+] kromem|1 year ago|reply

Also, part of the issue with the argument is that smaller firms like Anthropic now have models that exceed OpenAI's offering.

If we were actually at a point of diminishing returns, they'd only be catching up with similar levels of investment, not bypassing at lower levels.

And when digging into the details of the differences between something like Opus and GPT-4, it becomes more readily apparent why OpenAI's product line up is struggling.

Their current issues (to the extent they exist in practice) are related to OpenAI specific policies in how they are approaching developing their models in the fine tuning stages, not to inherent limitations in the underlying tech.

[+] TheAceOfHearts|1 year ago|reply

I thought this was to be expected. If you invest 1 billion USD in training a new model using existing tools and techniques it probably won't be as big of a leap in capabilities as previous generations. In particular, aren't we running out of content with which to train these models? There's obviously some key details that are still missing in the AI space precluding us from achieving things like "can learn how to drive in a couple hours".

I don't want to pile on the hate, but this creator has a fairly negative reputation on Twitter. He comes off as a bit of an asshole on social media sometimes. Making it difficult to disentangle the author from the topics he brings up when the author is so inflammatory.

[+] notamy|1 year ago|reply

> Huge jump from GPT-2 to GPT-3. Huge jump from GPT-3 to GPT-4 … and not so much for GPT-4 (13 months ago) to GPT-4 Turbo (just released). It’s hard not to see this plot as tentative evidence for the hypothesis of diminishing returns. Whatever doubling there might have been, has perhaps come to an end.

Maybe I’m misunderstanding how OpenAI names their models, but was there supposed to be a huge jump from GPT-4 to GPT-4 Turbo? Just from the names it sounds like it’s expected to be in the same realm of model performance, just faster.

[+] aduffy|1 year ago|reply

If OpenAI had GPT-5 they would’ve released that instead of 4-Turbo.

OpenAI was allegedly working on their Arrakis model last year that didn’t end up panning out. It’s clear they’re hitting some sort of limit in the scaling laws, or cost efficiency, or both.

[+] buildbot|1 year ago|reply

Your understanding is correct, Gary Marcus is intentionally misrepresenting what GPT-4 turbo is to make a story. It’s clearly the cost optimized version of gpt-4. That’s why it’s much cheaper per token.

[+] timrobinson333|1 year ago|reply

I think the point made in the article is that they wanted to build gpt 5 but they renamed it to 4 turbo because the improvements over 4 weren't what they were expecting

[+] firebaze|1 year ago|reply

The author suggests that GPT4-Turbo should have been GPT-5, but turned out too weak. Not sure if I buy that personally, I'm quite confident that OpenAI can deliver a lot more than GPT4 does right now, just not in a way that is economically feasible. GPT4-Turbo may be an intermediate release to have something to show while more work is being done behind the scenes.

[+] phillipcarter|1 year ago|reply

That article sure was a lot of words for "the rest of the market is catching up to OpenAI".

The rest is just a negative critical light on some of the woes of early adopters who likely aren't using the tech effectively. That's kind of problematic because (a) it doesn't emphasize how the tech takes skill and iteration to use right, and (b) it doesn't balance that out with examples of organizations who have seen immediate value from this tech.

[+] kromem|1 year ago|reply

Does anyone have a link to a prediction Gary Marcus made more than 24 months ago that turned out correct?

[+] kromem|1 year ago|reply

I was genuinely asking, and not getting any answers I decided to research it for myself.

I didn't find anything particularly compelling, but I did dig up this gem of an article about how deep learning was hitting a wall and "we are still a long way from machines that can genuinely understand human language" exactly four days before GPT-4 released:

https://nautil.us/deep-learning-is-hitting-a-wall-238440/

[+] andy99|1 year ago|reply

Best comment I've seen about this: https://news.ycombinator.com/item?id=39627249

[+] cratermoon|1 year ago|reply

Does anyone have a link to a prediction Sam Altman made more than 24 months ago that turned out correct?

[+] belter|1 year ago|reply

People read the guidelines...Flagging is not for stuff you disagree with...

[+] unknown|1 year ago|reply

[deleted]

[+] Etheryte|1 year ago|reply

Is it just me or does this look like an advanced form of blogspam? The author seems to take one idea and just throw out a dozen variations of it over a few days to see if anything sticks. A few titles from the past few days:

- Evidence that LLMs are reaching a point of diminishing returns — and what that might mean

- Superhuman AGI is not nigh

- $10 million says we won’t see human-superior AGI by the end of 2025

Etc, you get the idea.

[+] buildbot|1 year ago|reply

Unfortunately this specific blog spammer has convinced many a person they are the One True God of the correct path to AGI.

[+] hprotagonist|1 year ago|reply

Nature loves a sigmoid.

[+] ShamelessC|1 year ago|reply

Oh wow so the game Connections is apparently a robust indicator of LLM capabilities?

Give me a break. The thesis here may be correct but Gary Marcus’ hasn’t effectively argued that it is. Is there even a mention of the scaling law’s paper?

[+] Hnaomyiph|1 year ago|reply

Oroborus issues were seen from 100k miles away. LLMs were bound to run into issues immediately based on training data. If I weren’t substantially poor I’d put all my savings into puts of LL ‘S like ChatGPT, and others. I cannot risk burning substantials of my own time to offset the burnage those themselves are creating.

Good luck y’all. I’d say I’m hope I’m right but you’re on a ship sunk by glacial ice that is hoping for a lifeboat off. Good fucking luck, you have no time to pivot.

31 comments