AI slows down open source developers. Peter Naur can teach us why

narush|7 months ago

Hey HN -- study author here! (See previous thread on the paper here [1].)

I think this blog post is an interesting take on one specific factor that is likely contributing to slowdown. We discuss this in the paper [2] in the section "Implicit repository context (C.1.5)" -- check it out if you want to see some developer quotes about this factor.

> This is why AI coding tools, as they exist today, will generally slow someone down if they know what they are doing, and are working on a project that they understand.

I made this point in the other thread discussing the study, but in general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the full factors table on page 11).

> If there are no takers then I might try experimenting on myself.

This sounds super cool! I'd be very excited to see how you set this up + how it turns out... please do shoot me an email (in the paper) if you do this!

> AI slows down open source developers. Peter Naur can teach us why

Nit: I appreciate how hard it is to write short titles summarizing the paper (the graph title is the best I was able to do after a lot of trying) -- but I might have written this "Early-2025 AI slows down experienced open-source developers. Peter Naur can give us more context about one specific factor." It's admittedly less of a catchy-title, but I think getting the qualifications right are really important!

Thanks again for the sweet write-up! I'll hang around in the comments today as well.

[1] https://news.ycombinator.com/item?id=44522772

[2] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

seanwilson|7 months ago

If this makes sense, how is the study able to give a reasonable measure of how long an issue/task should have taken, vs how long it took with AI to determine that using AI was slower?

Or it's comparing how long the dev thought it should take with AI vs how long it actually took, which now includes the dev's guess of how AI impacts their productivity?

When it's hard to estimate how difficult an issue should be to complete, how does the study account for this? What percent speed up or slow down would be noise due to estimates being difficult?

I do appreciate that this stuff is very hard to measure.

calf|7 months ago

Slowing down isn't necessarily bad, maybe slow programming (literate/Knuth comes to mind as another early argument) encourages better theory formation. Maybe programming today is like fast food, and proper theory and abstraction (and language design) requires a good measure of slow and deliberate work that has not been the norm in industry.

jwhiles|7 months ago

Thanks for the response, and apologies for misrepresenting your results somewhat! I'm probably not going to change the title since I am at heart and polemicist and a sloppy thinker, but I'll update the article to call out this misrepresentation.

That said, I think that what I wrote more or less encompasses three of the factors you call out as being likely to contribute: "High developer familiarity with reposito- ries", "Large and complex repositories", and "Implicit repository context".

I thought more about experimenting on myself, and while I hope to do it - I think it will be very hard to create a controlled enviornment whilst also responding to the demands the job puts on me. I also don't have the luxury of a list of well scoped tasks that could feasibly be completed in a few hours.

karmakaze|7 months ago

I would expect any change to an optimized workflow (developing own well understood project) to initially be slower. What I'd like to see is how these same developers do 6 months or a year from now after using AI has become the natural workflow on these same projects. The article mentions that these results don't extrapolate to other devs, but it's important to note that it may not extrapolate over time to these same devs.

I myself am just getting started and I can see how so many things can be scripted with AI that would be very difficult to (semi-)automate without. You gotta ask yourself "Is it worth the time?"[0]

[0] https://xkcd.com/1205/

antonvs|7 months ago

> Early-2025 AI slows down experienced open-source developers.

Even that's too general, because it'll depend on what the task is. It's not as if open source developers in general never work on tasks where AI could save time.

mung_daal|7 months ago

[deleted]

munificent|7 months ago

> The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself, probably applies to many other forms of human endeavour, and explains things as varied as why so many people think that AI has made them 10 times more productive, why I continue to use Vim, why people drive in London etc.

In boating, there's a notion of a "set and drift" which describes how wind and current pushes a boat off course. If a mariner isn't careful, they'll end up far from their destination because of it.

This is because when you're sitting in a boat, your perception of motion is relative and local. You feel the breeze on your face, and you see how the boat cuts through the surrounding water. You interpret that as motion towards your destination, but it can equally consist of wind and current where the medium itself is moving.

I think a similar effect explains all of these. Our perception of "making progress" is mostly a sense of motion and "stuff happening" in our immediate vicinity. It's not based on a perception of the goal getting closer, which is much harder to measure and develop an intuition for.

So people tend to choose strategies that make them feel like they're making progress even if it's not the most effective strategy. I think this is why people often take "shortcuts" when driving that are actually longer. All of the twists and turns keep them busy and make them feel like they're making more progress than zoning out on a boring interstate does.

wrsh07|7 months ago

Something I noticed early on when using AI tools was that it was great because I didn't get blocked. Somehow, I always wanted to keep going and always felt like I could keep going.

The problem, of course, is that one might thoughtlessly invoke the ai tool when it would be faster to make the one line change directly

Edit

This could make sense with the driving analogy. If the road I was planning to take is closed, gps will happily tell me to try something else. But if that fails too, it might go back to the original suggestion.

thinkingemote|7 months ago

Exactly! Waze the navigation app tends to route users on longer routes but which feels more fast. When driving we perceive our journey as fast or slow not by the actual length but by our memories of what happened. Waze knows human drivers are happier with driving a route that may be longer in time and distance of they feel like they are making progress with the twists and turns.

Ai tools makes programming feel easier. That it might be actually less productive is interesting but we humans prefer the easier shortcuts. Our memories of coding with AI tells us that we didn't struggle and therefore we made progress.

PicassoCTs|7 months ago

I also think that AI written code- is just not read. People hate code-reviews, and actively refuse to read code- because that is hard work, reading into other peoples thoughts and ideas.

This is why pushing for new code, rewrites, new frameworks is so popular. https://www.joelonsoftware.com/2000/04/06/things-you-should-...

So a ton of ai-generated code- is just that, never read. Its generated, tested against test-functions - and thats it. I wouldn't wonder, if some of these devs themselves have only marginal ideas whats in there codebases and why.

jiggawatts|7 months ago

> The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself

Linux/UNIX users are convinced of the superiority of keyboard control and CLI tools, but studies have shown that the mouse is faster for almost all common tasks.

Keyboard input feels faster because there are more actions per second.

Alex_L_Wood|7 months ago

We all as humans are hardwired to prefer greedy algorithms, basically.

blake1|7 months ago

I think a reasonable summary of the study referenced is that: "AI creates the perception of productivity enhancements far beyond the reality."

Even within the study, there were some participants who saw mild improvements to productivity, but most had a significant drop in productivity. This thread is now full of people telling their story about huge productivity gains they made with AI, but none of the comments contend with the central insight of this study: that these productivity gains are illusions. AI is a product designed to make you value the product.

In matters of personal value, perception is reality, no question. Anyone relying heavily on AI should really be worried that it is mostly a tool for warping their self-perception, one that creates dependency and a false sense of accomplishment. After all, it speaks a highly optimized stream of tokens at you, and you really have to wonder what the optimization goal was.

thinkingemote|7 months ago

It's like the difference between being fast and quick. AI tools make the developer feel quick but they may not be fast. It's less cognitive effort in some ways. It's an interesting illusion, one that is based on changing emotions from different feedback loops and the effects of how memory forms.

BriggyDwiggs42|7 months ago

I’ve noticed that you can definitely use them to help you learn something, but that your understanding tends to be more abstract and LLM-like that way. You definitely want to mix it up when learning too.

nico|7 months ago

> They are experienced open source developers, working on their own projects

I just started working on a 3-month old codebase written by someone else, in a framework and architecture I had never used before

Within a couple hours, with the help of Claude Code, I had already created a really nice system to replicate data from staging to local development. Something I had built before in other projects, and I new that manually it would take me a full day or two, especially without experience in the architecture

That immediately sped up my development even more, as now I had better data to test things locally

Then a couple hours later, I had already pushed my first PR. All code following the proper coding style and practices of the existing project and the framework. That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

So sure, AI won’t speed everyone or everything up. But at least in this one case, it gave me a huge boost

As I keep going, I expect things to slow down a bit, as the complexity of the project grows. However, it’s also given me the chance to get an amazing jumpstart

Vegenoid|7 months ago

I have had similar experiences as you, but this is not the kind of work that the study is talking about:

“When open source developers working in codebases that they are deeply familiar with use AI tools to complete a task, they take longer to complete that task”

I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

davidclark|7 months ago

> That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

What is your accuracy on software development estimates? I always see these productivity claims matched again “It would’ve taken me” timelines.

But, it’s never examined if we’re good at estimating. I know I am not good at estimates.

It’s also never examined if the quality of the PR is the same as it would’ve been. Are you skipping steps and system understanding which let you go faster, but with a higher % chance of bugs? You can do that without AI and get the same speed up.

OptionOfT|7 months ago

Now the question is: did you gain the same knowledge and proficiency in the codebase that you would've gained organically?

I find that when working with an LLM the difference in knowledge is the same as learning a new language. Learning to understanding another language is easier than learning to speak another language.

It's like my knowledge of C++. I can read it, and I can make modifications of existing files. But writing something from scratch without a template? That's a lot harder.

nico|7 months ago

Some additional notes given the comments in the thread

* I wasn’t trying to be dismissive of the article or the study, just wanted to present a different context in which AI tools do help a lot

* It’s not just code. It also helps with a lot of tasks. For example, Claude Code figured out how to “manually” connect to the AWS cluster that hosted the source db, tested different commands via docker inside the project containers and overall helped immensely with discovery of the overall structure and infrastructure of the project

* My professional experience as a developer, has been that 80-90% of the time, results trump code quality. That’s just the projects and companies I’ve been personally involved with. Mostly saas products in which business goals are usually considered more important than the specifics of the tech stack used. This doesn’t mean that 80-90% of code is garbage, it just means that most of the time readability, maintainability and shipping are more important than DRY, clever solutions or optimizations

* I don’t know how helpful AI is or could be for things that require super clever algorithms or special data structures, or where code quality is incredibly important

* Having said that, the AI tools I’ve used can write pretty good quality code, as long as they are provided with good examples and references, and the developer is on top of properly managing the context

* Additionally, these tools are improving almost on a weekly or monthly basis. My experience with them has drastically changed even in the last 3 months

At the end of the day, AI is not magic, it’s a tool, and I as the developer, am still accountable for the code and results I’m expected to deliver

PaulDavisThe1st|7 months ago

TFA was specifically about people very familiar with the project and codebase that they are working on. Your anecdots is precisely the opposite of the situation is was about, and it acknowledged the sort of process you describe.

kevmo314|7 months ago

You've missed the point of the article, which in fact agrees with your anecdote.

> It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work.

samtp|7 months ago

Well that's exactly what it does well at the moment. Boilerplate starter templates, landing pages, throwaway apps, etc. But for projects that need precision like data pipelines, security - it code generated has many subtle flaws that can/will cause giant headaches in your project unless you dig through every line produced

quantumHazer|7 months ago

You clearly have not read the study. Problem is developers thought they were 20% faster, but they were actually slower. Anyway from a fast review about your profile you're in conflict of interest about vibe coding, so I will definitely take your opinion with a grain of salt.

xoralkindi|7 months ago

How are you confident in the code, coding style and practices simply because the LLM says so. How do you know it is not hallucinating since you don't understand the codebase?

bko|7 months ago

When anecdote and data don't align, it's usually the data that's wrong.

Not always the case, but whenever I read about these strained studies or arguments about how AI is actually making people less productive, I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools. I wonder if the same thing happened with higher level programming languages where people argued, you may THINK not managing your own garbage collector will lead to more productivity but actually...

Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something. And I don't need a "study" to tell me that

markstos|7 months ago

I had a similar experience with AI and open source. AI allowed me to implement features in a language and stack I didn't know well. I had wanted these features for months and no one else was volunteering to implement them. I had tried to study the stack directly myself, but found the total picture to be complex and under-documented for people getting started.

Using Warp terminal (which used Claude) I was get past those barriers and achieve results that weren't happening at all before.

tomasz_fm|7 months ago

Only one developer in this study had more than 50h of Cursor experience, including time spent using Cursor during the study. That one developer saw a 25% speed improvement.

Everyone else was an absolute Cursor beginner with barely any Cursor experience. I don't find it surprising that using tools they're unfamiliar with slows software engineers down.

I don't think this study can be used to reach any sort of conclusion on use of AI and development speed.

narush|7 months ago

Hey, thanks for digging into the details here! Copying a relevant comment (https://news.ycombinator.com/item?id=44523638) from the other thread on the paper, in case it's help on this point.

1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.

2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.

3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!

4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.

5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.

In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).

I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!

(You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)

WhyNotHugo|7 months ago

An interesting little detail. Any seasoned developer is likely going to take substantially longer if they have to use any IDE except their everyday one.

I've been using Vim/Neovim for over a decade. I'm sure if I wanted to use something like Cursor, it would take me at least a month before I can productive even a fraction of my usual.

Art9681|7 months ago

This is exactly my same take. Any tool an engineer is inexperienced with will slow them down. AI is no different.

unknown|7 months ago

[deleted]

yomismoaqui|7 months ago

Someone on X said that these agentic AI tools (Claude Code, Amp, Gemini Cli) are to programming like the table saw was to hand-made woodworking.

It can make some things faster and better than a human with a saw, but you have to learn how to use them right (or you will loose some fingers).

I personally find that agentic AI tools make me be more ambitious in my projects, I can tackle some things I didn't tthougth about doing before. And I also delegate work that I don't like to them because they are going to do it better and quicker than me. So my mind is free to think on the real problems like architecture, the technical debt balance of my code...

Problem is that there is the temptation of letting the AI agent do everything and just commit the result without understanding YOUR code (yes, it was generated by an AI but if you sign the commit YOU are responsible for that code).

So as with any tool try to take the time to understand how to better use it and see if it works for you.

candiddevmike|7 months ago

> to programming like the table saw was to hand-made woodworking

This is a ridiculous comparison because the table saw is a precision tool (compared to manual woodworking) when agentic AI is anything but IMO.

bgwalter|7 months ago

"You are using it wrong!"

This is insulting to all pre-2023 open source developers, who produced the entire stack that the "AI" robber barons use in their companies.

It is even more insulting because no actual software of value has been demonstrably produced using "AI".

antimora|7 months ago

I'm one of the regular code reviewers for Burn (a deep learning framework in Rust). I recently had to close a PR because the submitter's bug fix was clearly written entirely by an AI agent. The "fix" simply muted an error instead of addressing the root cause. This is exactly what AI tends to do when it can't identify the actual problem. The code was unnecessarily verbose and even included tests for muting the error. Based on the person's profile, I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

dawnerd|7 months ago

That's what I love about LLMs. You can spot it doesn't know the answer, tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

It scares me how much code is being produced by people without enough experience to spot issues or people that just gave up caring. We're going to be in for wild ride when all the exploits start flowing.

Macha|7 months ago

I recently reviewed a MR from a coworker. There was a test that was clearly written by AI, except I guess however he prompted it, it gave some rather poor variable names like "thing1", "thing2", etc. in test cases. Basically, these were multiple permutations of data that all needed to be represented in the result set. So I asked for them to be named distinctively, maybe by what makes them special.

It's clear he just took that feedback and asked the AI to make the change, and it came up with a change that gave them all very long, very unique names, that just listed all the unique properties in the test case. But to the extent that they sort of became noise.

It's clear writing the PR was very fast for that developer, I'm sure they felt they were X times faster than writing it themselves. But this isn't a good outcome for the tool either. And I'm sure if they'd reviewed it to the extent I did, a lot of that gained time would have dissipated.

meindnoch|7 months ago

>a deep learning framework in Rust [...] This is becoming a troubling trend with AI tools.

The serpent is devouring its own tail.

jampa|7 months ago

> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

It has been for a while, AI just makes SPAM more effective:

https://news.ycombinator.com/item?id=24643894

pennomi|7 months ago

This is the most frustrating thing LLMs do. They put wide try:catch structures around the code making it impossible to actually track down the source of a problem. I want my code to fail fast and HARD during development so I can solve every problem immediately.

0xbadcafebee|7 months ago

> The "fix" simply muted an error instead of addressing the root cause.

FWIW, I have seen human developers do this countless times. In fact there are many people in engineering that will argue for these kinds of "fixes" by default. Usually it's in closed-source projects where the shittiness is hidden from the world, but trust me, it's common.

> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

There was already a problem (pre-AI) with shitty PRs on GitHub made to try to game a system. Regardless of how they made the change, the underlying problem is a policy one: how to deal with people making shitty changes for ulterior motives. I expect the solution is actually more AI to detect shitty changes from suspicious submitters.

Another solution (that I know nobody's going to go for): stop using GitHub. Back in the "olden times", we just had CVS, mailing lists and patches. You had to perform some effort in order to get to the point of getting the change done and merged, and it was not necessarily obvious afterward that you had contributed. This would probably stop 99% of people who are hoping for a quick change to boost their profile.

nerdjon|7 months ago

I will never forget being in a code review for a upcoming release, there was a method that was... different. Like massively different with no good reason why it was changed as much as it was for such a small addition.

We asked the person why they made the change, and "silence". They had no reason. It became painfully clear that all they did was copy and paste the method into an LLM and say "add this thing" and it spit out a completely redone method.

So now we had a change that no one in the company actually knew just because the developer took a shortcut. (this change was rejected and reverted).

The scariest thing to me is no one actually knowing what code is running anymore with these models having a tendency to make change for the sake of making change (and likely not actually addressing the root thing but a shortcut like you mentioned)

tomrod|7 months ago

As a side question: I work in AI, but mostly python and theory work. How can I best jump into Burn? Rust has been intriguing to me for a long time

lvl155|7 months ago

This is a real problem that’s only going to get worse. With the major model providers basically keeping all the data themselves, I frankly don’t like this trend long term.

doug_durham|7 months ago

You should be rejecting the PR because the fix was insufficient, not because it was AI agent written. Bad code is bad code regardless of the source. I think the fixation on how the code was generated is not productive.

andix|7 months ago

What I noticed: AI development constantly breaks my flow. It makes me more tired, and I work for shorter time periods on coding.

It's a myth that you can code a whole day long. I usually do intervals of 1-3 hours for coding, with some breaks in between. Procrastination can even happen on work related things, like reading other project members code/changes for an hour. It has a benefit to some extent, but during this time I don't get my work done.

Agentic AI works the best for me. Small refactoring tasks on a selected code snippet can be helpful, but isn't a huge time saver. The worst are AI code completions (first version Copilot style), they are much more noise then help.

rightbyte|7 months ago

It would be interesting to record what one do in a day at the desk. Probably quite depressing to watch.

Like, I think 1h would be streaching it for mature codebases.

lsy|7 months ago

Typically debugging, e.g., a tricky race condition in an unfamiliar code base would require adding logging, refactoring library calls, inspecting existing logs, and even rewriting parts of your program to be more modular or understandable. This is part of the theory-building.

When you have an AI that says "here is the race condition and here is the code change to make to fix it", that might be "faster" in the immediate sense, but it means you aren't understanding the program better or making it easier for anyone else to understand. There is also the question of whether this process is sustainable: does an AI-edited program eventually fall so far outside what is "normal" for a program that the AI becomes unable to model correct responses?

sodapopcan|7 months ago

This is always my thought whenever I hear the "AI let me build a feature in a codebase I didn't know in a language I didn't know" (which is often, there is at one in these comments). Great, but what have you learned? This is fine for small contributions, I guess, but I don't hear a lot of stories of long-term maintenance. Unpopular opinion, though, I know.

wellpast|7 months ago

The fact that the devs thought the AI saved them time is no surprise to me… at least at this point in my career.

Developers (people?) in general for some reason just simply cannot see time. It’s why so many people don’t believe in estimation.

What I don’t understand is why. Is this like a general human brain limitation (like not being able to visualize four dimensions, or how some folks don’t have an internal monologue)?

Or is this more psychodynamic or emotional?

It’s been super clear and interesting to me how developers I work with want to believe AI (code generation) is saving them time when it’s clearly obviously not.

Is it just the hope that one day it will? Is it fetishization of AI?

Why in an industry that so requires clarity of thinking and expression (computer processors don’t like ambiguity), can we be so bad at talking about, thinking about… time?

Don’t get me started on the static type enthusiasts who think their strong type system (another seeming fetish) is saving them time.

piker|7 months ago

My main two attempts at using an “agentic” coding workflow were trying to incorporate an Outlook COM interface into my rust code base and to streamline an existing abstract windows API interaction to avoid copying memory a couple of times. Both wasted tremendous amounts of time and were ultimately abandoned leaving me only slightly more educated about windows development. They make great autocompletion engines but I just cannot see them being useful in my project otherwise.

jdiff|7 months ago

They make great autocompletion engines, most of the time. It's nice when it can recognize that I'm replicating a specific math formula and expands out the next dozen lines for me. It's less nice when it predicts code that's not even syntactically valid for the language or the correct API for the library I'm using. Those times, for whatever reason, seem to be popping up a lot in the last few weeks so I find myself disabling those suggestions more often than not.

crinkly|7 months ago

This is typically what I see when I’ve seen it applied. And as always trying to hammer nails in with a banana.

Rather than fit two generally disparate things together it’s probably better to just use VSTO and C# (hammer and nails) rather than some unholy combination no one else has tried or suffered through. When it goes wrong there’s more info to get you unstuck.

charcircuit|7 months ago

I had the opposite experience. Gemini was able to work with COM and accomplish what I needed despite me never using COM before.

doc_manhat|7 months ago

I directionally disagree with this:

``` It's common for engineers to end up working on projects which they don't have an accurate mental model of. Projects built by people who have long since left the company for pastures new. It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work. ```

Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.

Having said that it also depends on how important it is to be writing bug free code in the given domain I guess.

I like AI particularly for green field stuff and one off scripts as it let's you go faster here. Basically you build up the mental model as you're coding with the AI.

Not sure about whether this breaks down at a certain codebase size though.

horsawlarway|7 months ago

Just anecdotally - I think your reason for disagreeing is a valid statement, but not a valid counterpoint to the argument being made.

So

> Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.

This is completely correct. It's a very fair statement. The problem is that a developer coming into a large legacy project is in this spot regardless of the existence of AI.

I've found that asking AI tools to generate a changeset in this case is actually a pretty solid way of starting to learn the mental model.

I want to see where it tries to make changes, what files it wants to touch, what libraries and patterns it uses, etc.

It's a poor man's proxy for having a subject matter expert in the code give you pointers. But it doesn't take anyone else's time, and as long as you're not just trying to dump output into a PR can actually be a pretty good resource.

The key is not letting it dump out a lot of code, in favor of directional signaling.

ex: Prompts like "Which files should I edit to implement a feature which does [detailed description of feature]?" Or "Where is [specific functionality] implemented in this codebase?" Have been real timesavers for me.

The actual code generation has probably been a net time loss.

uludag|7 months ago

Great article and I was having very similar thoughts with regards to this productivity study and the "Programming as Theory Building" paper. I'm starting to be convinced that if you are the original author of a program and still have the program's context in the head, you are the asymptote to which any and all AI systems will approach but never surpass: maybe not in terms of raw coding speed, but in terms of understanding the program, its vision of development, its deficiencies and hacks, its context, its users and what they want, the broader culture the program exists in, etc.

I really like how the author then brought up the point that for most daily work we don't have the theory built, even a small fraction of it, and that this may or may not change the equation.

conartist6|7 months ago

Thanks, <3

trey-jones|7 months ago

Doing my own post-mortem of a recent project (the first that I've leaned on "AI" tools to any extent), my feeling was the following:

1. It did not make me faster. I don't know that I expected it to.

2. It's very possible that it made me slower.

3. The quality of my work was better.

Slower and better are related here, because I used these tools more to either check ideas that I had for soundness, or to get some fresh ideas if I didn't have a good one. In many cases the workflow would be: "I don't like that idea, what else do you have for me?"

There were also instances of being led by my tools into a rabbit hole that I eventually just abandoned, so that also contributes to the slowness. This might happen in instances where I'm using "AI" to help cover areas that I'm less of an expert in (and these were great learning experiences). In my areas of expertise, it was much more likely that I would refine my ideas, or the "AI" tool's ideas into something that I was ultimately very pleased with, hence the improved quality.

Now, some people might think that speed is the only metric that matters, and certainly it's harder to quantify quality - but it definitely felt worth it to me.

jpc0|7 months ago

I do this a lot and absolutely think it might even improve it, and this is why I like the current crop of AIs that are more likely to be argumentative and not just capitulate.

I will ask the AI for an idea and then start blowing holes in its idea, or will ask it to do the same for my idea.

And I might end up not going with it’s idea regardless but it got me thinking about things I wouldn’t have thought about.

Effectively its like chatting to a coworker that has a reasonable idea about the domain and can bounce ideas around.

gjsman-1000|7 months ago

What I thought was fascinating, and should be a warning sign to everyone here:

Before beginning the study, the average developer expected about a 20% productivity boost.

After ending the study, the average developer (potentially: you) believed they actually were 20% more productive.

In reality, they were 0% more productive at best, and 40% less productive at worst.

Think about what it would be like to be that developer; off by 60% about your own output.

If you can't even gauge your own output without being 40% off on average, 60% off at worst; be cautious about strong opinions on anything in life. Especially politically.

Edit 1: Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight.

Edit 2: I suppose this goes to show, that even on Hacker News, where there are relatively high-IQ and self-aware individuals present... 95% of the crowd can still possibly be wildly delusional. Stick to your gut, regardless of the crowd, and regardless of who is in it.

bluefirebrand|7 months ago

> Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight

Yeah, this is me at my job right now. Every time I express even the mildest skepticism about the value of our Cursor subscription, I'm getting follow up conversations basically telling me to shut up about it

It's been very demoralizing. You're not allowed to question the Emperor's new clothes

pphysch|7 months ago

Given how deadlines/timelines tend to (not) work in SWE, this is not surprising.

unknown|7 months ago

[deleted]

quantumHazer|7 months ago

This should really be top comment. The problem is this tools can really give us some value in certain type of areas, but they are not like they are marketed.

neuroelectron|7 months ago

Good article and it makes sense. I wish I had sometime in my career worked on a codebase that was possible to be understood without 10 years of experience. Instead most of my development time was spent tracing execution paths through tangles of abstractions in nested objects in 10M LOC legacy codebases. My buddy who introduced me to the job is still doing it today and now uses AI and this has given him the free time to start working on his own side projects. So there's certain types if jobs where AI will certainly speed up your development.

xyst|7 months ago

Not surprising. Use of LLM has only been helpful in initial exploration of unknown code bases or languages for me.

Using it beyond that is just more work. First parse the broken response, remove any useless junk, have it reprocess with updated query.

It’s a nice tool to have (just as search engines gave us easy access to multiple sources/forums), but its limitations are well known. Trying to use it 100% as intended is a massive waste of time and resources (energy use…)

omnicognate|7 months ago

All these studies that show "AI makes developers x% more/less productive" are predicated on the idea that developer "productivity" can be usefully captured in a single objectively measurable number.

Just one problem with that...

narush|7 months ago

Thanks for the feedback! I strongly agree this is not the only measure of developer productivity -- but it's certainly one of them. I think this measure as speaks very directly to how _many_ developers (myself included) understand the impact of AI tools on their own work currently (e.g. just speeding up implementation speed).

(The SPACE [1] framework is a pretty overview of considerations here; I agree with a lot of it, although I'll note that METR [2] has different motivations for studying developer productivity than Microsoft does.)

[1] https://dl.acm.org/doi/10.1145/3454122.3454124

[2] https://metr.org/about

charcircuit|7 months ago

As long as the true productivity is correlated with that number it should be fine.

hakfoo|7 months ago

My two cents:

My experience with AI is that it's workable for "approximate" things, but it's frustratingly difficult to use as a precision tool.

It works great for the trivial demos, where you say "here's an API, build a client" without significant constraints, because that use case is a pretty wide, vague goal. I wasn't going to hold it accountable for matching existing corporate branding, code style, or how to use storage efficiently, so it can work fine.

But most of the real work is in the "precision tool" space. You aren't building that many blank-slate API clients, many of the actual tickets are "flip bit 29 of data structure XQ33 when it's a married taxpayer filing singly and huckleberries are in season". The actual change is 3 lines of code, and the effort is in thinking and understanding the problem (and the hundreds of lines of misdocumented code surrounding the problem).

I've had Claude decide it wanted to refactor a bunch of unrelated code after asking for a minor, specific change. Or the classic "here's 2000 lines of code that solve the problem in a highly Enterprise way, when the real developer would look at the problem and spit up 150 lines of actual functionality". You can either spend 30 minutes writing the prompt to do the specific precision thing you want and only that, or you can just write the fix directly.

joshmarlow|7 months ago

I've gotten some pretty cool things working with LLMs doing most of the heavy lifting using the following approaches:

* spec out project goals and relevant context in a README and spec out all components; have the AI build out each component and compose them. I understand the high-level but don't necessarily know all of the low-level details. This is particularly helpful when I'm not deeply familiar with some of the underlying technologies/libraries. * having an AI write tests for code that I've verified is working. As we all know, testing is tedious - so of course I want to automate it. And we written tests (for well written code) can be pretty easy to review.

stevekrouse|7 months ago

Such a great essay! Peter Naur's thesis is also the central point in my talk about vibe coding from last month: https://www.youtube.com/watch?v=1WC8dxMC4Xw

I'm spending an inordinate amount of time turning that video into an essay, but I feel like I'm being scooped already, so here's my current draft in case anyone wants to get a sneak preview: https://valdottown--89ed76076a6544019f981f7d4397d736.web.val...

Feedback appreciated :)

unknown|7 months ago

[deleted]

sltr|7 months ago

A couple of months ago I put forth Naur's program theory as an argument why LLM's can't replace human developers:

> LLMs as they currently exist cannot master a theory, design, or mental construct because they don't remember beyond their context window. Only humans can can gain and retain program theory.

https://news.ycombinator.com/item?id=44114631

cadamsdotcom|7 months ago

The study doesn’t provide data that can be extrapolated in any way to any statement about anybody slowing down or speeding up.

1. They used Cursor, which makes you spend your whole day saying “yes” to code changes. Cursor is the windows vista “cancel or allow” of agentic coding.

2. Every user was a Cursor beginner.

The study measured developers’ effectiveness at writing code during the ramp up period while they were learning to use a bad tool.

Kim_Bruning|7 months ago

I think different people use these tools differently. I've got mine set up to start in "rubber duck" mode, where I do rubber duck programming, before asking the AI to help me with certain tasks (if at all). Low impact utility scripts? The AI gets let off the leash. Critical core logic? I might do most of the work myself (though having a rubber duck can still be good!)

kazinator|7 months ago

> Interestingly the developers predict that AI will make them faster, and continue to believe that it did make them faster, even after completing the task slower than they otherwise would!

It's like a form of gambling in which you don't have a simple indicator that you're going broke. The addiction is the same though.

i_love_retros|7 months ago

The AI hype will die off just like block chain and web3. LLMs are a solution in search of a problem.

All the VCs are gonna lose a ton of money! OpenAI will be NopenAI, relegated to the dustbin of history.

We never asked for this, nobody wants it.

Companies using AI and promoting it in their products will be seen as tacky and cheap. Just like developers and artists that use it.

cratermoon|7 months ago

dissected https://www.fightforthehuman.com/are-developers-slowed-down-...

ringeryless|7 months ago

not to mention the annoyance of AI assisted issues being opened, many times incorrectly due to hallucinations. these tickets hammer human teams with nonsense and suck resources away from real issues.

remorses|7 months ago

Using AI agents productively requires setting up a repository for collaboration, it means writing docs and making the build process easy and fast.

As any other tool AI is slow to adopt but has huge gains later on

hartator|7 months ago

I am not super sure how to quickly writing benchmark scripts that are one-shot used slows anyone down, but okay.

diamond559|7 months ago

Measure twice cut once. Not cut 100 times and hope it does it right once.

afro88|7 months ago

I said this when the linked paper was shared and got downvotes: it's based on early 2025 data. My point isn't that it should be completely up to date, but that how we need to consider it in that context. This is pre Claude 4, Claude Code. Pre Gemini 2.5 even. These models are such a big step up from what came previously.

Just like we put a (2023) on articles here so they are considered in the right context, so too this paper should be. Blanket "AI tools slow sown development" statements with a "look this rigorous paper says so!" is ignoring a key variable: the rate of effectiveness improvement. If said paper evaluated with the current models, the picture would be different. Also in 3 months time. AI tools aren't a static thing that either works or don't indefinitely.

tonyedgecombe|7 months ago

>This is pre Claude 4, Claude Code. Pre Gemini 2.5 even.

The most interesting point from the article wasn't about how well the AI's worked, rather it was the gap between peoples perception and their actual results.

methuselah_in|7 months ago

Those of current generation students who have access to ai might become slow over time. Because when things are not readily available then they have to struggle and work harder in that process, at that time I thing human a lot of secondary things ! Now when everything is easily available especially knowledge without knowing how to struggle with basics. It will eventually make kids dumb. But can be opposite also. Eventually even I become slow even I keep on using chat gpt or gemini.

alganet|7 months ago

This idea that some developers have some "mental model" and others not is an extraordinary claim, and I don't see extraordinary evidence.

It sounds like a good thing, right? "Wow, mental model. I want that, I want to be good and have big brain", which encourages you to believe the bullshit.

The truth is, this paper is irrelevant and a waste of time. It only serves the purpose of creating discussion around the subject. It's not science, it's a cupholder for marketing.

imiric|7 months ago

You couldn't be more wrong. If you've ever programmed, or worked with programmers, that is not an extraordinary claim at all, but a widely accepted fact.

A mental model of the software is what allows a programmer to intuitively know why the software is behaving a certain way, or what the most optimal design for a feature would be. In the vast majority of cases these intuitions are correct, and other programmers should pay attention to them. This ability is what separates those with a mental model and those without.

On the other hand, LLMs are unable to do this, and are usually not used in ways that help build a mental model. At best, they can summarize the design of a system or answer questions about its behavior, which can be helpful, but a mental model is an abstract model of the software, not a textual summary of its design or behavior. Those neural pathways can only be activated by natural learning and manual programming.

bunderbunder|7 months ago

> It's a really fabulous study...

Ehhhh... not so much. It had serious design flaws in both the protocol and the analysis. This blog post is a fairly approachable explanation of what's wrong with it: https://www.argmin.net/p/are-developers-finally-out-of-a-job

narush|7 months ago

Hey, thanks for linking this! I'm a study author, and I greatly appreciate that this author dug into the appendix and provided feedback so that other folks can read it as well.

A few notes if it's helpful:

1. This post is primarily worried about ordering considerations -- I think this is a valid concern. We explicitly call this out in the paper [1] as a factor we can't rule out -- see "Bias from issue completion order (C.2.4)". We have no evidence this occurred, but we also don't have evidence it didn't.

2. "I mean, rather than boring us with these robustness checks, METR could just release a CSV with three columns (developer ID, task condition, time)." Seconded :) We're planning on open-sourcing pretty much this data (and some core analysis code) later this week here: https://github.com/METR/Measuring-Early-2025-AI-on-Exp-OSS-D... - star if you want to dig in when it comes out.

3. As I said in my comment on the post, the takeaway at the end of the post is that "What we can glean from this study is that even expert developers aren’t great at predicting how long tasks will take. And despite the new coding tools being incredibly useful, people are certainly far too optimistic about the dramatic gains in productivity they will bring." I think this is a reasonable takeaway from the study overall. As we say in the "We do not provide evidence that:" section of the paper (Page 17), we don't provide evidence across all developers (or even most developers) -- and ofc, this is just a point-in-time measurement that could totally be different by now (from tooling and model improvements in the past month alone).

Thanks again for linking, and to the original author for their detailed review. It's greatly appreciated!

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

d00mB0t|7 months ago

Blasphemy! How dare you say our Emperor has no clothes! AI is becoming a cult and I'm not here for it.

whatever1|7 months ago

They didn’t use the latest model that was released yesterday night. Follow my paid course to learn how to vibe code/s

gr8beehive|7 months ago

Mirror neurons got people drinking the same stupid kool aid without realizing it.

rosspackard|7 months ago

One mediocre paper/study (it should not even be called that with all the bias and sample size issues) and now we have to put up with stories re-hashing and dissecting it. I really hope these don't get upvoted more in the future.

16 devs. And they weren't allowed to pick which tasks they used the AI on. Ridiculous. Also using it on "old and >1 million line" codebases and then extrapolating that to software engineering in general.

Writers like this then theorize why AI isn't helpful, then those "theories" get repeated until it feels less like a theory and more like a fact and it all proliferates into an echo chamber of AI isn't a useful tool. There have been too many anecdotes and my own personal experience to ignore that it isn't useful.

It is a tool and you have to learn it to be successful with it.

sumeno|7 months ago

> And they weren't allowed to pick which tasks they used the AI on.

They were allowed to pick whether or not to use AI on a subset of tasks. They weren't forced to use AI on tasks that don't make sense for AI

RamblingCTO|7 months ago

It's just the same with all the anecdotal evidence of some hype guys on twitter claiming 10x performance on coding ... Same same but different

steveklabnik|7 months ago

> and then extrapolating that to software engineering in general.

To the credit of the paper authors, they were very clear that they were not making a claim against software engineering in general. But everyone wants to reinforce their biases, so...

jplusequalt|7 months ago

>One mediocre paper/study (it should not even be called that with all the bias and sample size issues)

Can you bring up any specific issues with the metr study? Alternatively, can you site a journal that critiques it?

mkagenius|7 months ago

AI tends to slow us down because we don't really know what it's good at. Can it write a proper Nginx config? I don't know—let's try. And then we end up wasting 30 minutes on it.

Fully autonomous coding tools like v0, a0, or Aider work well as long as the context is small. But once the context grows—usually due to mistakes made in earlier steps—they just can’t keep up. There's no real benefit of "try again" loop yet.

For now, I think simple VSCode extensions are the most useful. You get focused assistance on small files or snippets you’re working on, and that’s usually all you need.

ethan_smith|7 months ago

The context switching cost between coding and AI interaction is substantial and rarely measured in these studies. Each prompt/review cycle breaks flow state, which is particularly damaging for complex programming tasks where deep concentration yields the greatest productivity.

211 comments