WingNews

Dylan16807|8 months ago

Models are improving every day. People are figuring out thousands of different optimizations to training and to hardware efficiency. The idea that right now in early June 2025 is when improvement stops beggars belief. We might be approaching a limit, but that's going to be a sigmoid curve, not a sudden halt in advancement.

a2128|8 months ago

I think at this point we're reaching more incremental updates, which can score higher on some benchmarks but then simultaneously behave worse with real-world prompts, most especially if they were prompt engineered for a specific model. I recall Google updating their Flash model on their API with no way to revert to the old one and it caused a lot of people to complain that everything they've built is no longer working because the model is just behaving differently than when they wrote all the prompts.

deadbabe|8 months ago

5 years ago a person would be blown away by today’s LLMs. But people today will merely say “cool” at whatever LLMs are in use 5 years from now. Or maybe not even that.

sitkack|8 months ago

It is copium that it will suddenly stop and the world they knew before will return.

ChatGPT came out in Nov 2022. Attention Was All There Was in 2017, we were already 5 years in the past. Or 5 years of research to catch up to, and then from 2022 to now ... papers and research have been increasing exponentially. Even in if SOTA models were frozen, we still have years of research to apply and optimize in various ways.

groby_b|8 months ago

It is "inevitable" in the sense that in 99% of the cases, tomorrow is just like yesterday.

LLMs have been continually improving for years now. The surprising thing would be them not improving further. And if you follow the research even remotely, you know they'll improve for a while, because not all of the breakthroughs have landed in commercial models yet.

It's not "techno-utopian determinism". It's a clearly visible trajectory.

Meanwhile, if they didn't improve, it wouldn't make a significant change to the overall observations. It's picking a minor nit.

The observation that strict prompt adherence plus prompt archival could shift how we program is both true, and it's a phenomenon we observed several times in the past. Nobody keeps the assembly output from the compiler around anymore, either.

There's definitely valid criticism to the passage, and it's overly optimistic - in that most non-trivial prompts are still underspecified and have multiple possible implementations, not all correct. That's both a more useful criticism, and not tied to LLM improvements at all.

double0jimb0|8 months ago

Are there places that follow the research that speak to the layperson?

its-kostya|8 months ago

What is ironic, if we buy in to the theory that AI will write majority of the code in the next 5-10 years, what is it going to train on after? ITSELF? Seems this theoretic trajectory of "will inevitably get better" is is only true if humans are producing quality training data. The quality of code LLMs create is very well proportionate on how mature and ubiquitous the langues/projects are.

solarwindy|8 months ago

I think you neatly summarise why the current pre-trained LLM paradigm is a dead end. If these models were really capable of artificial reasoning and learning, they wouldn’t need more training data at all. If they could learn like a human junior does, and actually progress to being a senior, then I really could believe that we’ll all be out of a job—but they just do not.

sumedh|8 months ago

More compute mean more faster processing, more context.

Sevii|8 months ago

Models have improved significantly over the last 3 months. Yet people have been saying 'What if they've actually reached their limits by now?' for pushing 3 years.

BoorishBears|8 months ago

This is just people talking past each other.

If you want a model that's getting better at helping you as a tool (which for the record, I do), then you'd say in the last 3 months things got better between Gemini's long context performance, the return of Claude Opus, etc.

But if your goal post is replacing SWEs entirely... then it's not hard to argue we definitely didn't overcome any new foundational issues in the last 3 months, and not too many were solved in the last 3 years even.

In the last year the only real foundational breakthrough would be RL-based reasoning w/ test time compute delivering real results, but what that does to hallucinations + even Deepseek catching up with just a few months of post-training shows in its current form, the technique doesn't completely blow up any barriers that were standing the way people were originally touting it.

Overall models are getting better at things we can trivially post-train and synthesize examples for, but it doesn't feel like we're breaking unsolved problems at a substantially accelerated rate (yet.)

greyadept|8 months ago

For me, improvement means no hallucination, but that only seems to have gotten worse and I'm interested to find out whether it's actually solvable at all.

atomlib|8 months ago

https://xkcd.com/605/

(no title)

discuss