top | item 47142220

(no title)

tibbar | 5 days ago

Today I got a feature request from another team in a call. I typed into our slack channel as a note. Someone typed @cursor and moments later the feature was implemented (correctly) and ready to merge.

The tools are good! The main bottleneck right now is better scaffolding so that they can be thoroughly adopted and so that the agents can QA their own work.

I see no particular reason not to think that software engineering as we know it will be massively disrupted in the next few years, and probably other industries close behind.

discuss

order

nemooperans|5 days ago

The anecdote is compelling, but there's an interesting measurement gap. METR ran a randomized controlled trial with experienced open-source developers — they were actually 19% slower with AI assistance, but self-reported being 24% faster. A ~40 point perception gap.

Doesn't mean the tools aren't useful — it means we're probably measuring the wrong thing. "Prompt engineering" was always a dead end that obscured the deeper question: the structure an AI operates within — persistent context, feedback loops, behavioral constraints — matters more than the model or the prompts you feed it. The real intelligence might be in the harness, not the horse.

rodonn|1 day ago

There's been a huge amount of improvement in coding agent effectiveness since they ran that experiment. In a more recent follow up experiment, METR found 20% speed up from AI assistance and says they believe that is likely an underestimate of the impact. https://metr.org/blog/2026-02-24-uplift-update/

They are working on making a new measurement approach that will be more accurate.

tibbar|5 days ago

Respectfully, was this comment AI generated? It has all the signs.

And scaffolding does matter a lot, but mostly because the models just got a lot better and the corresponding scaffolding for long running tasks hasn't really caught up yet.

JohnMakin|5 days ago

It really doesn't matter how "good" these tools feel, or whatever vague metric you want - they hemorrhage cash at a rate perhaps not seen in human history. In other words, that usage you like is costing them tons of money - the bet is that energy/compute will become vastly cheaper in a matter of a couple of years (extremely unlikely), or they find other ways to monetize that don't absolutely destroy the utility of their product (ads, an area we have seen google flop in spectacularly).

And even say the latter strategy works - ads are driven by consumption. If you believe 100% openAI's vision of these tools replacing huge swaths of the workforce reasonably quickly, who will be left to consume? It's all nonsense, and the numbers are nonsense if you spend any real time considering it. The fact SoftBank is a major investor should be a dead giveaway.

df2dd|5 days ago

Indeed. Many of the posts I see on here are hilarious.

Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.

This lack of reproducibility is a huge problem and limits how far the thing can go.

nfg|5 days ago

> In other words, that usage you like is costing them tons of money

Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.

javascriptfan69|5 days ago

What was the feature and what was the note?

tibbar|5 days ago

It was a modest update to a UX ... certainly nothing world-changing. (It's also had success with some backend performance refactors, but this particular change was all frontend.) The note was basically just a transcription of what I was asked to do, and did not provide any technical hints as to how to go about the work. The agent figured out what codebase, application, and file to modify and made the correct edit.

tapoxi|5 days ago

Yeah but was Cursor using Claude? What's the moat that any of these companies have that prevents me from using another LLM?