This is pretty recent - the survey they ran (99 respondents) was August 18 to September 23 2025 and the field observations (watching developers for 45 minute then a 30 minute interview, 13 participants) were August 1 to October 3.
The models were mostly GPT-5 and Claude Sonnet 4. The study was too early to catch the 5.x Codex or Claude 4.5 models (bar one mention of Sonnet 4.5.)
This is notable because a lot of academic papers take 6-12 months to come out, by which time the LLM space has often moved on by an entire model generation.
> academic papers take 6-12 months to come out, by which time the LLM space has often moved on by an entire model generation.
This is a recurring argument which I don't understand. Doesn't it simply mean that whatever conclusion they did was valid then? The research process is about approximating a better description of a phenomenon to understand it. It's not about providing a definitive answer. Being "an entire model generation" behind would be important if fundamental problems, e.g. no more hallucinations, would be solved but if it's going from incremental changes then most likely the conclusions remain correct. Which fundamental change (I don't think labeling newer models as "better" is sufficient) do you believe invalidate their conclusions in this specific context?
I’m glad someone else noticed the time frames — turns out the lead author here has published 28 distinct preprints in the past 60 days, almost all of which are marked as being officially published already/soon.
Certainly some scientists are just absurdly efficient and all 28 involved teams, but that’s still a lot.
Personally speaking, this gives me second thoughts about their dedication to truly accurately measuring something as notoriously tricky as corporate SWE performance. Any number of cut corners in a novel & empirical study like this would be hard to notice from the final product, especially for casual readers…TBH, the clickbait title doesn’t help either!
I don’t have a specific critique on why 4 months is definitely too short to do it right tho. Just vibe-reviewing, I guess ;)
For what it’s worth I know this is likely intended to read as the new generation of models will somehow better than any paper will be able to gauge, that hasn’t been my experience.
Results are getting worse and less accurate, hell, I even had Claude drop some Chinese into a response out of the blue one day.
The title is doing a lot of work here. What resonated with me is the shift from “writing code” to “steering systems” rather than the hype framing. Senior devs already spend more time constraining, reviewing, and shaping outcomes than typing syntax. AI just makes that explicit. The real skill gap isn’t prompt cleverness, it’s knowing when the agent is confidently wrong and how to fence it in with tests, architecture, and invariants. That part doesn’t scale magically.
It's difficult to steer complex systems correctly, because no one has a complete picture of the end goal at the outset. That's why waterfall fails. Writing code agentically means you have to go out of your way to think deeply about what you're building, because it won't be forced on you by the act of writing code. If your requirements are complex, they might actually be a hindrance because you're going have to learn those lessons from failed iterations instead of avoiding them preemptively.
The stereotype that writing code is for junior developers needs to die. Some devs are hired with lofty titles specifically for their programming aptitude and esoteric systems knowlege, not to play implementation telephone with inexperienced devs.
So much of my professional SWE jobs isn't even programming - I feel like this is a detail missed by so many. Generally people just stereotype SWE as a programmer, but being an engineer (in any discipline) is so much more than that. You solve problems. AI will speed up the programming work-streams, but there is so much more to our jobs than that.
Most of the work brought to me gets done before I even think about sitting down to type.
And it's interesting to see the divide here between "pure coder" and "coder + more". A lot of people seem to be in the job to just do what the PM, designer and business people ask. A lot of work is pushing back against some of those requests. In conversations here in HN about "essential complexity" I even see commenters arguing that the spec brought to you is entirely essential. It's not.
^This 100%. Junior SWE here. Agentic coding has kinda felt like a promotion for me. I code less by hand and spend more time on the actual engineering side of things. There’s hype in both directions though. I don’t AI is replacing me anytime soon(fingers crossed), but it’s already way more useful than the skeptics give it credit for. Like most things the truth’s somewhere in the middle.
There is also so much more you can automate and use AI agents for than "programming". It's the world's best rubber duck, for one. It also can dig through code bases and compile information on data flows, data models and so on. Hell, it can automate effectively any task you do on the terminal.
It feels like we're doing another lift to a higher level of abstraction. Whereas we had "automatic programming" and "high level programming languages" free us from assembly, where higher level abstractions could be represented without the author having to know or care about the assembly (and it took decades for the switch to happen), we now once again get pulled up another layer.
We're in the midst of another abstraction level becoming the working layer - and that's not a small layer jump but a jump to a completely different plane. And I think once again, we'll benefit from getting tools that help us specify the high level concepts we intend, and ways to enforce that the generated code is correct - not necessarily fast or efficient but at least correct - same as compilers do. And this lift is happening on a much more accelerated timeline.
The problem of ensuring correctness of the generated code across all the layers we're now skipping is going to be the crux of how we manage to leverage LLM/agentic coding.
we've never seen a profession drive themselves so aggressively to irrelevance. software engineering will always exist, but it's amazing the pace to which pressure against the profession is rising. 2026 will be a very happy new year indeed for those paying the salaries. :)
We've been giving our work away to each other for free as open source to help improve each other's productivity for 30+ years now and that's only made our profession more valuable.
> we've never seen a profession drive themselves so aggressively to irrelevance.
Should we be trying to put the genie back in the bottle? If not, what exactly are you suggesting?
Even if we all agreed to stop using AI tools today, what about the rest of world? Will everybody agree to stop using it? Do you think that is even a remote possibility?
Also it really baffles me how many are actually in on the hype train. Its a lot more than the crypto bros back in the day. Good thing AI still cant reason and innovate stuff. Also leaking credentials is a felony in my country so I also wont ever attach it to my codebases.
You know what. After seeing all these articles about AI/LLM for these past 4 years, about how they are going to replace me as software developers and about how I am not productive enough without using 5 agents and being a project manager.
I. Don't. Care.
I don't even care about those debates outside. Debates about do LLM work and replace programmers? Say they do, ok so what?
I simply have too much fun programming. I am just a mere fullstack business line programmer, generic random replaceable dude, you can find me dime a dozen.
I do use LLM as Stack Overflow/docs replacement, but I always code by hand all my code.
If you want to replace me, replace me. I'll go to companies that need me. If there are no companies that need my skill, fine, then I'll just do this as a hobby, and probably flip burgers outside to make a living.
I don't care about your LLM, I don't care about your agent, I probably don't even care about the job prospects for that matter if I have to be forced to use tools that I don't like and to use workflows I don't like. You can go ahead find others who are willing to do it for you.
As for me, I simply have too much fun programming. Now if you excuse me, I need to go have fun.
Hear hear. I didn't spend half my life getting an education, competing in the corporate crab bucket, retraining and upskilling just to turn into a robot babysitter.
I appreciate this perspective. I'm actually hoping LLM hype will help to pop the bubble of tech salaries, make the profession roughly as profitable as going into teaching, so maybe the gold diggers will clear out and go play the stock market or something, rest of us can stick around and build things. Maybe software quality will even improve as a result? Would be nice...
I hear you but I feel like you (and really others like you, in mass) should not be so passive about your replacement. For most programmers, simply flipping burgers for money to enjoy programming a few hours a week is not going to work. Making a living is a thing. If you are reduced to having to flip burgers that means the economy will gave collapsed and there won’t be any magic Elon UBI money to save us.
I simply will not spend my life begging and coaxing a machine to output working code. If that is what becomes of this profession, I will just do something else :)
Idk, I still mostly avoid using it and if I do, I just copy and paste shit into the Claude web version. I wont ever manage agents as that sounds just as complicated as coding shit myself.
It's not complicated at all. You don't "manage agents". You just type your prompt into an terminal application that can update files, read your docs and run your tests.
As with every new tech there's a hell of a lot of noise (plugins, skills, hooks, MCP, LSP - to quote Kaparthy) but most of it can just be disregarded. No one is "behind" - it's all very easy to use.
If developers are not using TLA+ or Lean4 etc. They are vibe coding. Nothing wrong with that. They just have to realize that they were never in control. Thinking logically is much harder than developers imagined. As Dijkstra observed, the whole field has adopted the mentra, "How to program when you cannot." I estimate that 80% of what developers do can be done once and for all for all of humanity, yet we don't learn. Be offended all you want, but I am fed up with this idiocy given all the usual rebuttals of deadlines etc.
> Takeaway 3c: Experienced developers disagree about using agents for software
planning and design. Some avoided agents out of concern over the importance of
design, while others embraced back-and-forth design with an AI.
Im in the back-and-forth camp. I expect a lot of interesting UX to develop here. I built https://github.com/backnotprop/plannotator over the weekend to give me a better way to review & collaborate around plans - all while natively integrated into the coding agent harness.
The title is provocative but there's truth to it. The distinction between "vibing" with AI tools and actually controlling the output is crucial for production code.
I've seen this with code generation tools - developers who treat AI suggestions as magic often struggle when the output doesn't work or introduces subtle bugs. The professionals who succeed are those who understand what the AI is doing, validate the output rigorously, and maintain clear mental models of their system.
This becomes especially important for code quality and technical debt. If you're just accepting AI-generated code without understanding architectural implications, you're building a maintenance nightmare. Control means being able to reason about tradeoffs, not just getting something that "works" in the moment.
I often tell people that agentic programming tools are the best thing since cscope. The last 6 months I have not used cscope even once after decades of using it nearly daily.
Out of curiosity, if I wanted to setup cscope for a bunch of small projects, say dozens of prototypes in their own directory, would it be useful? Too broad?
The new layer of abstraction is tests. Mostly end-to-end and integration tests. It describes the important constraints to the agents, essentially long lived context.
So essentially what this means is a declarative programming system of overall system behavior.
Page 2:
We define agentic tools or agents as AI tools integrated into an IDE or a terminal that can manipulate the code directly (i.e., excluding web-based chat interfaces)
Yeah, it feels many SWEs have painted themselves into a corner. They love the nose-to-code-grindstone process and chain themselves to the abstraction layer of today. I don't think it's gonna end well for them, let's see.
This is a qualitative methods paper, so statistical significance is not relevant. The rough qualitative equivalent would instead be "data saturation" (responses generally look like ones you've received already) and "thematic saturation" (you've likely found all the themes you will find through this method of data collection). There's an intuitive quality to determining the number of responses needed based on the topic and research questions, but this looks to me like they have achieved sufficient thematic saturation based on the results.
simonw|2 months ago
The models were mostly GPT-5 and Claude Sonnet 4. The study was too early to catch the 5.x Codex or Claude 4.5 models (bar one mention of Sonnet 4.5.)
This is notable because a lot of academic papers take 6-12 months to come out, by which time the LLM space has often moved on by an entire model generation.
utopiah|2 months ago
This is a recurring argument which I don't understand. Doesn't it simply mean that whatever conclusion they did was valid then? The research process is about approximating a better description of a phenomenon to understand it. It's not about providing a definitive answer. Being "an entire model generation" behind would be important if fundamental problems, e.g. no more hallucinations, would be solved but if it's going from incremental changes then most likely the conclusions remain correct. Which fundamental change (I don't think labeling newer models as "better" is sufficient) do you believe invalidate their conclusions in this specific context?
bbor|2 months ago
Certainly some scientists are just absurdly efficient and all 28 involved teams, but that’s still a lot.
Personally speaking, this gives me second thoughts about their dedication to truly accurately measuring something as notoriously tricky as corporate SWE performance. Any number of cut corners in a novel & empirical study like this would be hard to notice from the final product, especially for casual readers…TBH, the clickbait title doesn’t help either!
I don’t have a specific critique on why 4 months is definitely too short to do it right tho. Just vibe-reviewing, I guess ;)
dheera|2 months ago
It takes about 6 months to figure out how to get LaTeX to position figures where you want them, and then another 6 months to fight with reviewers
ActionHank|2 months ago
Results are getting worse and less accurate, hell, I even had Claude drop some Chinese into a response out of the blue one day.
reactordev|2 months ago
joenot443|2 months ago
Off your intuition, do you think the same study with Codex 5.2 and Opus 4.5 would see even better results?
trq126154|2 months ago
[deleted]
runtimepanic|2 months ago
asmor|2 months ago
AlotOfReading|2 months ago
codeformoney|2 months ago
Madmallard|2 months ago
Strongly suspect this is simply less efficient than doing it yourself if you have enough expertise.
llmslave2|2 months ago
lesuorac|2 months ago
> Number of Survey Respondents
> Building apps 53
> Testing 1
I think this sums up everybody complaints about AI generated code. Don't ask me to be the one to review work you didn't even check.
rco8786|2 months ago
throw-12-16|2 months ago
danavar|2 months ago
whstl|2 months ago
Most of the work brought to me gets done before I even think about sitting down to type.
And it's interesting to see the divide here between "pure coder" and "coder + more". A lot of people seem to be in the job to just do what the PM, designer and business people ask. A lot of work is pushing back against some of those requests. In conversations here in HN about "essential complexity" I even see commenters arguing that the spec brought to you is entirely essential. It's not.
ciaranmca|2 months ago
danielbln|2 months ago
AYBABTME|2 months ago
We're in the midst of another abstraction level becoming the working layer - and that's not a small layer jump but a jump to a completely different plane. And I think once again, we'll benefit from getting tools that help us specify the high level concepts we intend, and ways to enforce that the generated code is correct - not necessarily fast or efficient but at least correct - same as compilers do. And this lift is happening on a much more accelerated timeline.
The problem of ensuring correctness of the generated code across all the layers we're now skipping is going to be the crux of how we manage to leverage LLM/agentic coding.
Maybe Cursor is TurboPascal.
websiteapi|2 months ago
simonw|2 months ago
cheema33|2 months ago
Should we be trying to put the genie back in the bottle? If not, what exactly are you suggesting?
Even if we all agreed to stop using AI tools today, what about the rest of world? Will everybody agree to stop using it? Do you think that is even a remote possibility?
mkoubaa|2 months ago
throw-12-16|2 months ago
Software Devs not so much.
There is a huge difference between the two and they are not interchangeable.
zwnow|2 months ago
banbangtuth|2 months ago
I. Don't. Care.
I don't even care about those debates outside. Debates about do LLM work and replace programmers? Say they do, ok so what?
I simply have too much fun programming. I am just a mere fullstack business line programmer, generic random replaceable dude, you can find me dime a dozen.
I do use LLM as Stack Overflow/docs replacement, but I always code by hand all my code.
If you want to replace me, replace me. I'll go to companies that need me. If there are no companies that need my skill, fine, then I'll just do this as a hobby, and probably flip burgers outside to make a living.
I don't care about your LLM, I don't care about your agent, I probably don't even care about the job prospects for that matter if I have to be forced to use tools that I don't like and to use workflows I don't like. You can go ahead find others who are willing to do it for you.
As for me, I simply have too much fun programming. Now if you excuse me, I need to go have fun.
lifetimerubyist|2 months ago
yacthing|2 months ago
(1) already have enough money to survive without working, or
(2) don't realize how hard of a life it would be to "flip burgers" to make a living in 2026.
We live very good lives as software developers. Don't be a fool and think you could just "flip burgers" and be fine.
hecanjog|2 months ago
dinkumthinkum|2 months ago
llmslave2|2 months ago
agentifysh|2 months ago
geldedus|2 months ago
zwnow|2 months ago
lexandstuff|2 months ago
As with every new tech there's a hell of a lot of noise (plugins, skills, hooks, MCP, LSP - to quote Kaparthy) but most of it can just be disregarded. No one is "behind" - it's all very easy to use.
andy99|2 months ago
mattnewton|2 months ago
learningstud|1 month ago
https://news.ycombinator.com/item?id=43679634
senshan|2 months ago
"I’m on disability, but agents let me code again and be more productive than ever (in a 25+ year career). - S22"
Once Social Security Administration learns this, there goes the disability benefit...
LoganDark|2 months ago
ramoz|2 months ago
Im in the back-and-forth camp. I expect a lot of interesting UX to develop here. I built https://github.com/backnotprop/plannotator over the weekend to give me a better way to review & collaborate around plans - all while natively integrated into the coding agent harness.
andrewstuart|2 months ago
Do it in the way that makes you feel happy, or conforms to organizational standards.
mkoubaa|2 months ago
Well
amkharg26|2 months ago
I've seen this with code generation tools - developers who treat AI suggestions as magic often struggle when the output doesn't work or introduces subtle bugs. The professionals who succeed are those who understand what the AI is doing, validate the output rigorously, and maintain clear mental models of their system.
This becomes especially important for code quality and technical debt. If you're just accepting AI-generated code without understanding architectural implications, you're building a maintenance nightmare. Control means being able to reason about tradeoffs, not just getting something that "works" in the moment.
senshan|2 months ago
[0] https://en.wikipedia.org/wiki/Cscope
utopiah|2 months ago
Out of curiosity, if I wanted to setup cscope for a bunch of small projects, say dozens of prototypes in their own directory, would it be useful? Too broad?
softwaredoug|2 months ago
So essentially what this means is a declarative programming system of overall system behavior.
zkmon|2 months ago
senshan|2 months ago
esafak|2 months ago
000ooo000|2 months ago
4b11b4|2 months ago
throw-12-16|2 months ago
danielbln|2 months ago
game_the0ry|2 months ago
Not a statistically significant sample size.
flurie|2 months ago
bee_rider|2 months ago
https://www.surveymonkey.com/mp/sample-size-calculator/
HPsquared|2 months ago
energy123|2 months ago
superjose|2 months ago
unknown|2 months ago
[deleted]
unknown|2 months ago
[deleted]
SunlitCat|2 months ago