top | item 46525707

(no title)

redhale | 1 month ago

Not necessarily responding to you directly, but I find this take to be interesting, and I see it every time an article like this makes the rounds.

Starting back in 2022/2023:

- (~2022) It can auto-complete one line, but it can't write a full function.

- (~2023) Ok, it can write a full function, but it can't write a full feature.

- (~2024) Ok, it can write a full feature, but it can't write a simple application.

- (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product.

- (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term.

It's pretty clear to me where this is going. The only question is how long it takes to get there.

discuss

arkensaw|1 month ago

> It's pretty clear to me where this is going. The only question is how long it takes to get there.

I don't think its a guarantee. all of the things it can do from that list are greenfield, they just have increasing complexity. The problem comes because even in agentic mode, these models do not (and I would argue, can not) understand code or how it works, they just see patterns and generate a plausible sounding explanation or solution. agentic mode means they can try/fail/try/fail/try/fail until something works, but without understanding the code, especially of a large, complex, long-lived codebase, they can unwittingly break something without realising - just like an intern or newbie on the project, which is the most common analogy for LLMs, with good reason.

namrog84|1 month ago

While I do agree with you. To play the counterpoint advocate though.

What if we get to the point where all software is basically created 'on the fly' as greenfield projects as needed? And you never need to have complex large long lived codebase?

It is probably incredibly wasteful, but ignoring that, could it work?

bayindirh|1 month ago

Well, the first 90% is easy, the hard part is the second 90%.

Case in point: Self driving cars.

Also, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders.

throwthrowuknow|1 month ago

Even if Opus 4.5 is the limit it’s still a massively useful tool. I don’t believe it’s the limit though for the simple fact that a lot could be done by creating more specialized models for each subdomain i.e. they’ve focused mostly on web based development but could do the same for any other paradigm.

literalAardvark|1 month ago

They're not blenders.

This is clear from the fact that you can distill the logic ability from a 700b parameter model into a 14b model and maintain almost all of it.

You just lose knowledge, which can be provided externally, and which is the actual "pirated" part.

The logic is _learned_

unknown|1 month ago

[deleted]

mcfedr|1 month ago

i like to think of LLMs as random number generators with a filter

rat9988|1 month ago

> Well, the first 90% is easy, the hard part is the second 90%.

You'd need to prove that this assertion applies here. I understand that you can't deduce the future gains rate from the past, but you also can't state this as universal truth.

PunchyHamster|1 month ago

Note that blog posts rarely show the 20 other times it failed to build something and only that time that it happened to work.

We've been having same progression with self driving cars and they are also stuck on the last 10% for last 5 years

redhale|1 month ago

I agree with your observation, but not your conclusion. The 20 times it failed basically don't matter -- they are branches that can just be thrown away, and all that was lost is a few dollars on tokens (ignoring the environmental impact, which is a different conversation).

As long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better.

And I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around.

theshrike79|1 month ago

With "art" we're now at a situation where I can get 50 variations of a image prompt within seconds from an LLM.

Does it matter that 49 of them "failed"? It cost me fractions of a cent, so not really.

If every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway.

It's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it.

sanderjd|1 month ago

Yeah maybe, but personally it feels more like a plateau to me than an exponential takeoff, at the moment.

And this isn't a pessimistic take! I love this period of time where the models themselves are unbelievably useful, and people are also focusing on the user experience of using those amazing models to do useful things. It's an exciting time!

But I'm still pretty skeptical of "these things are about to not require human operators in the loop at all!".

throwthrowuknow|1 month ago

I can agree that it doesn’t seem exponential yet but this is at least linear progression not a plateau.

Scea91|1 month ago

> - (~2023) Ok, it can write a full function, but it can't write a full feature.

The trend is definitely here, but even today, heavily depends on the feature.

While extra useful, it requires intense iteration and human insight for > 90% of our backlog. We develop a cybersecurity product.

EthanHeilman|1 month ago

I haven't seen an AI successfully write a full feature to an existing codebase without substantial help, I don't think we are there yet.

> The only question is how long it takes to get there.

This is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations.

It is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom.

kubb|1 month ago

Each of these years we’ve had a claim that it’s about to replace all engineers.

By your logic, does it mean that engineers will never get replaced?

fernandezpablo|1 month ago

Starting back in 2022/2023:

- (~2022) "It's so over for developers". 2022 ends with more professional developers than 2021.

- (~2023) "Ok, now it's really over for developers". 2023 ends with more professional developers than 2022.

- (~2024) "Ok, now it's really, really over for developers". 2024 ends with more professional developers than 2023.

- (~2025) "Ok, now it's really, really, absolutely over for developers". 2025 ends with more professional developers than 2024.

- (~2025+) etc.

Sources: https://www.jetbrains.com/lp/devecosystem-data-playground/#g...

HarHarVeryFunny|1 month ago

Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help.

I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool.

itsthecourier|1 month ago

I use it on a 10 years codebase, needs to explain where to get context but successfully works 90% of time

mjr00|1 month ago

This is disingenuous because LLMs were already writing full, simple applications in 2023.[0]

They're definitely better now, but it's not like ChatGPT 3.5 couldn't write a full simple todo list app in 2023. There were a billion blog posts talking about that and how it meant the death of the software industry.

Plus I'd actually argue more of the improvements have come from tooling around the models rather than what's in the models themselves.

[0] eg https://www.youtube.com/watch?v=GizsSo-EevA

blitz_skull|1 month ago

What LLM were you using to build full applications in 2023? That certainly wasn’t my experience.

ugurs|1 month ago

Ok, it can create a long-lived complex codebase for a product that is extensible and scalable over the long term, but it doesn't have cool tattoos and can't fancy a matcha