(no title)
ACCount36 | 6 months ago
By now, the main reason people expect AI progress to halt is cope. People say "AI progress is going to stop, any minute now, just you wait" because the alternative makes them very, very uncomfortable.
ACCount36 | 6 months ago
By now, the main reason people expect AI progress to halt is cope. People say "AI progress is going to stop, any minute now, just you wait" because the alternative makes them very, very uncomfortable.
benterix|6 months ago
(NB I'm a very rational person and based on my lifelong experience and on how many times life surprised me both negatively and positively, I'd say the chance of a great breakthrough occurring short term is 50%, but it has nothing to do or cannot be extrapolated from the current development as this can go any way actually. We already had multiple AI winters and I'm sure humanity will have dozens if not hundreds of them still.)
ACCount36|6 months ago
Are you disappointed that there's no sudden breakthrough that yielded an AI that casually beats any human at any task? That human thinking wasn't obsoleted overnight? That may or may not happen yet. But a "slow" churn of +10% performance upgrades results in the same outcome eventually.
There's only this many "+10% performance upgrades" left between ChatGPT and the peak of human capabilities, and the gap is ever diminishing.
disgruntledphd2|6 months ago
OK, so where is the new data going to come from? Fundamentally, LLMs work by doing token prediction when some token(s) are masked. This process (which doesn't require supervision hence why it scaled) seems to be fundamental to LLM improvement. And basically all of the AI companies have slurped up all of the text (and presumably all of the videos) on the internet. Where does the next order of magnitude increase in data come from?
More fundamentally, lots of the hype is about research/novel stuff which seems to me to be very, very difficult to get from a model that's trained to produce plausible text. Like, how does one expect to see improvements in biology (for example) based on text input and output.
Remember, these models don't appear to reason much like humans, they seem to do well where the training data is sufficient (interpolation) and do badly where there isn't enough data (extrapolation).
I'd love to understand how this is all supposed to change, but haven't really seen much useful evidence (i.e. papers and experiments) on this, just AI CEOs talking their book. Happy to be corrected if I'm wrong.
insignificntape|6 months ago
Look at Claude Code. Unless they hacked into private GitHub/GitLab repos... (which, honestly, I wouldn't put beyond these tech CEO's, see what CloudFlare recently found out about Perplexity as an example), but unless they really did that, they trained Claude 4 on approximately the same data as Claude 3. Yet for some reason its agentic coding skills are stupidly enhanced when compared to previous iterations.
Data no longer seems to be the bottleneck. Which is understandable. At the end of the day, data is really just a way to get the AI to make a predicion and run gradient descent on it. If you can generate for example a bunch of unit tests, you can let the AI freewheel its way into getting them to pass. A kid learns to catch a baseball not by seeing a million examples of people catching balls, but instead by testing their skills in the real world, and gathering feedback from the real world on whether their attempt to catch the ball was successful. If an AI can try to achieve goals and assess whether or not its actions lead to a successful or a failed attempt, who needs more data?
fragmede|6 months ago
The other fundamental bottleneck is compute. Moore's law hasn't gone away, so if the LLM was GPT-3, and used 1 supercomputer's worth of compute for 3 months back in 2022, and the supercomputer used for training is, say, three times more powerful (3x faster CPU and 3x the RAM), then training on a latest generation supercomputer should lead to a more powerful LLM simply by virtue of scaling that up and no algorithmic changes. The exact nature of the improvement isn't easily back of the envelope calculatable, but even with a laymen's understanding of how these things work, that doesn't seem like an unreasonable assumption on how things will go, and not "AI CEOs talking their book". Simply running with a bigger context window should allow the LLM to be more useful.
Finally though, why do you assume that, absent papers up on arvix, that there haven't and won't be any algorithmic improvements to training and inference? We've already seen how allowing the LLM to take longer to process the input (eg "ultrathink" to Claude) allows for better results. It seems unlikely that all possible algorithmic improvements have already been discovered and implemented. Just because OpenAI et Al aren't writing academic papers to share their discovery with the world and are, instead, preferring to keep that improvement private and proprietary, in order to try and gain a competitive edge in a very competitive business seems like a far more reasonable assumption. With literal billions of dollars on the line, would you spend your time writing a paper, or would you try and outcompete your competitors? If simply giving the LLM longer to process the input before user facing output is returned, what other algorithmic improvements on the inference side on a bigger supercomputer with more ram available to it are possible? Deepseek seems to say there's a ton of optimization still as of yet to be done.
Happy to hear opposing points of view, but I don't think any of the things I've theorized here to be totally inconceivable. Of course there's a discussion to be had about diminishing returns, but we'd need a far deeper understanding is the state of the art on all three facets I raised in order to have an in depth and practical discussion on the subject. (Which tbc I'm open to hearing, though the comments section on HN is probably not the platform to gain said deeper understanding of the subject at hand).
ACCount36|6 months ago
Unlocking better sample efficiency is algorithmically hard and computationally expensive (with known methods) - but if new high quality data becomes more expensive and compute becomes cheaper, expect that to come into play heavily.
"Produce plausible text" is by itself an "AGI complete" task. "Text" is an incredibly rich modality, and "plausible" requires capturing a lot of knowledge and reasoning. If an AI could complete this task to perfection, it would have to be an AGI by necessity.
We're nowhere near that "perfection" - but close enough for LLMs to adopt and apply many, many thinking patterns that were once exclusive to humans.
Certainly enough of them that sufficiently scaffolded and constrained LLMs can already explore solution spaces, and find new solutions that eluded both previous generations of algorithms and humans - i.e. AlphaEvolve.