How much are LLMs boosting real-world programmer productivity?

[+] jit_hacker|1 year ago|reply

I work at a popular Seattle tech company. and AI is being shoved down our throats by leadership. to the point it was made known they're tracking how much devs use AI and I've even been asked when I'm personally not using it more. and I've long been a believer in using the right tool for the right job. And sometimes it's AI, but not super often

I spent a lot of time trying to think about how we arrived here. where I work there are a lot of Senior Directors and SVPs who used to write code 10+ years ago. Who if you would ask them to build a little hack project they would have no idea where to start. And AI has given them back something they've lost because they can build something simple super quickly. But they fail to see that just because it accelerates their hack project, it won't accelerate someone who's an expert. i.e. AI might help a hobbyist plant a garden, but it wouldn't help a farmer squeeze out more yield.

[+] lolinder|1 year ago|reply

> just because it accelerates their hack project, it won't accelerate someone who's an expert.

I would say that this is the wrong distinction. I'm an expert who's still in the code every day, and AI still accelerates my hack projects that I do in my spare time, but only to a point. When I hit 10k lines of code then code generation with chat models becomes substantially less useful (though autocomplete/Cursor-style advanced autocomplete retains its value).

I think the distinction that matters is the type of project being worked on. Greenfield stuff—whether a hobby project or a business project—can see real benefits from AI. But eventually the process of working on the code becomes far more about understanding the complex interactions between the dozens to hundreds of components that are already written than it is about getting a fresh chunk of code onto the screen. And AI models—even embedded in fancy tools like Cursor—are still objectively terrible at understanding the kinds of complex interactions between systems and subsystems that professional developers deal with day in and day out.

[+] SkyPuncher|1 year ago|reply

I’m using it to ship real projects with real customers in a real code base at 2x to 5x the rate I was last year.

It’s just like having a junior dev. Without enough guidance and guardrails, they spin their wheels and do dumb things.

Instead of focusing of lines of code, I’m now focusing on describing overall tasks, breaking them down, and guiding an LLM to a solution.

[+] thegrim33|1 year ago|reply

The point is that leadership gets to write on their own promo document / resume about how they "boosted developer productivity" by leading the charge on introducing AI dev processes to the company. Then they'll be long gone onto the next job before anybody actually knows what the result of it was, whether it actually boosted productivity or not, whether there were negative side-effects, etc.

[+] lumost|1 year ago|reply

Aye - this is a limitation of the current tech. For any project greater than 1k lines where the model was not pretrained on the code base…. AI is simply not useful beyond documentation search.

It’s easy to see this effect in any new project you start with AI, the first few pieces of functionality are easy to implement. Boilerplate gets written effortlessly. Then the ai can’t reason about the code and makes dumb mistakes.

[+] mikeocool|1 year ago|reply

What’s that old (and in my experience pretty accurate) adage? The last 10% of a software project takes 90% of the time?

In my experience, AI is helpful for that first 90% — when the codebase is pretty simple, and all of the weird business logic edge cases haven’t crept in. In the last 10%(as well as most “legacy” codebases), it seems to have a lot trouble understanding enough to generate helpful output at more than a basic level.

Furthermore, if you’re not deliberate with your AI usage, it really gets you into “this code is too complicated for the AI to be much help with” territory a lot faster.

I’d imagine this is part of why we’re not seeing an explosion of software productivity.

[+] hn_throwaway_99|1 year ago|reply

This is my experience as well. There are a couple things I love using AI for, like learning new programming languages or technologies (I consider myself an expert in Java and NodeJS, and proficient in Python, but I recently took a job where I had to program in an unfamiliar language), and it's been great for programming up short little "apps" for me for things I want - I've built a slew of browser apps for myself that just save stuff to local storage so that I can easily put it up on GitHub pages (and then I create import and export functions if I switch browsers - export just opens a mailto link where the body just contains a link with the state as a param, so then I just save that email, open it up on a different device and click on the link).

But I've found that there are a lot of places where it kind of falls over. I recently had Cursor do a large "refactoring" for me, and I was impressed with the process it went through, but at the end of the day I still had to review it all, it missed a couple places, and worse, it undid a bug fix that I put in (the bug was previously created when I had AI write a short function for me).

The other thing the makes me really worried is that AI makes it easy to be lazy and add tons of boilerplate code, where in the old world if I had to do it all manually I would definitely have DRY-ed stuff up. So it makes my life immediately easier, but the next guy now is going to have a shit ton more code to look at when they try to understand the project in the first place. AI definitely can help with that understanding/summarization, but a lot of times I feel like code maintenance is a lot of finding that "needle in a haystack", and AI makes it easy to add a shit ton of hay without a second thought.

[+] javajosh|1 year ago|reply

Yeah, I've been disappointed in a lot of code generation within my field of expertise. However, if I need to whip up some bash scripts, AI works very well. But if I want those bash scripts to be actually good, AI just can't get there. It certainly cannot "think outside the box" and deliver anything close to novel or even elegant (although it may give some tactical help writing boilerplate lightly adapted to your codebase). The analogy I use is that LLM AIs are like a new car mechanic tool that can generate any nut, bolt or gasket, for free and instantly (just add electricity!). It's great addition to the toolset for a seasoned mechanic, distracting for a junior, and is not even in the same universe required to fix an entire car, let alone design one.

[+] photonthug|1 year ago|reply

Like it says in tfa, it’s frustrating how we can never seem to move past anecdotes and “but did you try <insert flavor of the week>” and if you’re lucky, benchmarks that may or may not be scams.

10x, 20x etc productivity boosts really should be easy to see. My favorite example of this is the idea of porting popular things like media wiki/wordpress to popular things like Django/rails. Charitable challenge right, since there’s lots of history / examples, and it’s more translation than invention. What about porting large well known code bases from c to rust, etc. Clearly people are interested in such things.

There would be a really really obvious uptick in interesting examples like this if impossible dreams were now suddenly weekend projects.

If you don’t have an example like this.. well another vibes coding anecdote about another CRUD app or a bash script with tricky awk is just not really what TFA is asking about. That is just evidence that LLMs have finally fixed search, which is great, but not the subject that we’re all the most curious about.

[+] crabsand|1 year ago|reply

Recently I tried to translate a few 100-200 line scripts from zsh to nushell with Claude 3.5 Sonnet and it sucked. This is now my go to experiment for new LLMs, translating code between two programming languages must be easier than translating natural language to a programming language, yet we don't see any such results, even for popular languages.

[+] player1234|1 year ago|reply

So you are saying that one-off scripts and hobby projects is not a trillion dollar industry? Blasphemy!

The emergent crew and I are going down going down to the shrine of Sam Altman later today to sacrifice a goat, maybe you would like to come and learn a thing or two?

[+] KaiserPro|1 year ago|reply

Disclaimer: I work at a FAANG with exceptionally good integration of LLM into my IDE.

For me its been a everso slight net positive.

In terms of in-IDE productivity it has improved a little bit. Stuff that is mostly repetitive can be autocompleted by the LLM. It can, in some cases provide function names from other files that traditional intelliCode can't do because of codebase size.

However it also hallucinates plausible shit, which significantly undermines the productivity gains above.

I suspect that if I ask it directly to create a function to do X, it might work better. rather than expecting it to work like autocomplete (even though I comment my code much more than my peers)

over all rating: for our code base, its not as good as c# intelliCode/VS code.

Where it is good is asking how I do some basic thing in $language that I have forgotten. Anything harder and it start going into bullshit land.

I think if you have more comprehensive tests it works better.

I have not had much success with agentic workflow, mainly because I've not been using the larger models. (Our internal agentic workflow is limited access)

[+] IshKebab|1 year ago|reply

That's sort of my experience too. It's really good at some auto-complete (especially copying patterns already in your code). For example if you write some cross product function it will easily auto-complete the equations after seeing one. It's obviously not as good as intellisense where that works.

And it's really good for basic stuff in things you don't want to have to look up. E.g. "write me JavaScript to delete all DOM nodes with class 'foo'".

I reckon you're underestimating how much time that saves though. The auto-complete saves me a few seconds many times a day. Maybe 2-3 minutes a day. A whole day in one year.

The "write me some shitty script" stuff saves much more time. Sometimes maybe an hour. That's rarer but even one a month that's still a whole day in a year.

Maybe 2 days in a year doesn't seem significant but given the low cost of these AI tools I think any company is self-harming if they don't pay for them for everyone.

(Also I think the hallucination isn't that bad once you get used to it.)

[+] technofiend|1 year ago|reply

That's been my experience: it's good at basic tasks. It can be prompted to write idiomatic code in very small amounts. Anything beyond that and it's just as likely to either write non-optimal code or silently delete code while trying to satisfy your ask.

[+] itsoktocry|1 year ago|reply

>I suspect that if I ask it directly to create a function to do X

This is how LLMs are most effective, and also the reason why I don't believe non programmers will be crushing code. You do actually need to know how to program.

[+] techpineapple|1 year ago|reply

The one example I can think of of real world developer getting seemingly 10x improvement is Pieter levels coding a 3D multiplayer flight sim in a few days vibe coding. I tried vibe coding with cursor and mostly ran into simple roadblock after simple roadblock, I’m curious to watch some unedited videos of people working this way.

[+] colecut|1 year ago|reply

That has been a pretty wild development to watch..

I've been following him for a while, it's interesting what a polarizing figure he is..

a recent comment on his X feed resonated with quite a few people

"I’ve never seen an account that feels inspirational and makes me want to kill myself simultaneously"

He is definitely an entrepreneur/business man first and a developer second.. He has taught himself the skills to minimally get the job done functionally, has decent eye for design, and knows how to market... He makes it blatantly obvious, moreso than usual, that sales and marketing are way more important to making money than doing things technically "the right way".. At least on these smaller scale things because he only knows how to work by himself.

People hate that he's figured out how to make tens of thousands of dollars from a game that is arguably worse than hundreds of games that make nothing... I see that as just another skill that can be learned. And it is cool how transparent he is about his operations.

But yes, even with his blueprints it is hard to replicate his successes, my attempt of "one shotting" a video game was not very impressive..

[+] rlupi|1 year ago|reply

Do you have a link?

[+] EliRivers|1 year ago|reply

Every so often, it saves me a few hours on a task that's not very difficult, but that I just don't know how to do already. Generally something generic a lot of people have already done.

An example from today was using XAudio2 on windows to output sound, where that sound was already being fetched as interleaved data from a network source. I could have read the docs, found some example code, and bashed it together in a few hours; but I asked one of the LLMs and it gave me some example code tuned to my request, giving me a head start on that.

I had to already know a lot of context to be able to ask it the right questions, I suspect, and to thence tune it with a few follow up questions.

[+] rectang|1 year ago|reply

The biggest timesaver for me so far is composing complex SQL queries with elements of SQL I don't use very often. In such cases I know what I want, but the specific syntax eludes me. Previously solving that has required poring over documentation and QA sites, but finding the right documentation and gradually debugging is tedious. An LLM gets me farther along.

[+] theshrike79|1 year ago|reply

Same here. I had to write a DynamicObject for a DSL-like system in C# to make it behave like Python dicts.

With some LLM help I was done before lunch. After lunch I wrote some additional unit tests and improved on the solution - again with LLM help (the object type changes in unit tests vs integration tests, one is the actual type, one is a JsonDocument).

I could've definitely done all that by myself, but when the LLM wrote the boilerplate crap that someone had definitely written before (but in a way I couldn't find with a search engine) I could focus on testing and optimising the solution instead of figuring out C# DynamicObject quirks.

[+] cmrx64|1 year ago|reply

https://chatgpt.com/share/67cca4ac-2a38-8000-9901-9f56219c06...

… was curious what Gpt-4.5 would do with an absolute paucity of context :)

[+] avastmick|1 year ago|reply

I’m a solo founder/developer (https://kayshun.co) my relationship/usage of LLMs for codegen has been complicated.

At first I was all in with Copilot and various similar plugins for neovim. It helped me get going but did produce the worst code in the application. Also I found (personal preference) that the autocomplete function actually slowed me down; it made me pause or even prevented me from seeing what I was doing rather than just typing out what I needed to. I stopped using any codegen for about four months at the end of 2024; I felt it was not making me more productive.

This year it’s back on the table with avante[0] and cursor (the latter back off the table due to the huge memory requirements). Then recently Claude Code dropped and I am currently feeling like I have productivity super powers. I’ve set it up in a pair programming style (old XP coder) where I write careful specs (prompts) and tests (which I code); it writes code; I review run the tests and commit. I work with it. I do not just let it just run as I have found I waste more time unwinding its output than watching each step.

From being pretty disillusioned six months ago I can now see it as a powerful tool.

Can it replace devs? In my opinion, some. Like all things it’s garbage in garbage out. So the idea a non-technical product manager can produce quality outputs seems unlikely to me.

0: https://github.com/yetone/avante.nvim

[+] simonbarker87|1 year ago|reply

Similar to me with CoPilot, I found it made it harder for me to spin up my brain to full power to tackle a genuinely tricky problem because I was letting it solve the simple ones for me. I stopped using it after about 3 months and had a total pause from CodeGen tools. Now I use Claude like a very documentation-knowledgeable junior developer who can write code very quickly. I guide it on the architecture and approach and sanity check what it does but let it save me a tonne of typing. I don’t use it for everything but as a CTO for an early stage startup that needs to turn somethings around quickly it’s incredibly useful.

[+] summarity|1 year ago|reply

One of my teams at GitHub develops Copilot Autofix, which suggests fixes based on CodeQL alerts (another of my teams’ projects). Based on data of actual devs interacting with alerts and fixes, we see an average 3x speed up in time to fix over no Autofix, and up to 12x for some bug types. There’s more were doing but the theme I’m seeing is that lots of the friction points along the SDLC get accelerated.

[+] malux85|1 year ago|reply

One of the interesting things about LLM coding assistants is that the quality of the answer is significantly influenced by the communication skill of the programmer.

Some of the juniors I mentor cannot formulate their questions clearly and as a result, get a poor answer. They don’t understand that an LLM will answer the question you ask, which might not be the global best solution, it’s just answering your question - and if you ask the question poorly (or worse - the wrong question) you’re going to get bad results.

I have seen significant jumps in senior programmers capabilities, in some cases 20x, and when I see a junior or intermediate complaining about how useless LLM coding assistants are it always makes me very suspicious about the person, in that I think the problem is almost certainly their poor communication skills causing them to ask the wrong things.

[+] LouisSayers|1 year ago|reply

I agree, if you can communicate effectively AI is a huge productivity jump.

Another thing I've found is to actually engineer a solution - all the basic coding principles come into play, keeping code clean, cohesive and designing it to make testing easy. The human part of this is essential as AI has no barometer on when it's gone too far with abstraction or not far enough.

When code is broken up into good abstractions then the AI part of filling in the gaps is where you see a lot of the productivity increases. It's like filling water into an ice tray.

[+] hirsin|1 year ago|reply

> I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.

This has bad assumptions about what higher productivity looks like.

Other alternatives include:

1. Companies require fewer engineers, so there are layoffs. Software products are cheaper than before because the cost to build and maintain them is reduced.

2. Companies require fewer engineers so they lay them off and retain the spend, using it as stock buybacks or exec comp.

And certainly it feels like we've seen #2 out in the wild.

Assuming that the number of people working on software you use remains constant is not a good assumption.

(Personally this has been my finding. I'm able to get a bit more done in my day by eg writing a quick script to do something tedious. But not 5x more)

[+] barnabee|1 year ago|reply

It’s extremely circumstantial for me.

Sometimes they give me maybe a 5–10% improvement (i.e. nice but not world changing). Usually that’s when they’re working as an alternative to docs, solving the odd bug, helping write tests or occasional glue code, etc. for a bigger or more complex/inportant solution.

In other cases I’ve literally built a small functioning app/tool in 6–12 hours of elapsed time, where most of that is spent waiting (all but unattended, so I guess this counts as “vibe coding”) while the LLM does its thing. It’s probably required less than an hour of my time in those cases and would easily have taken at least 1–2 days, if not more for me. So I’d say it’s at least sometimes comfortably 10x.

More to the point, in those cases I simply wouldn’t have tried to create the tool, knowing how long it’d take. It’s unclear what the cumulative incremental value of all these new tools and possibilities will be, but that’s also non-zero.

[+] throwawa14223|1 year ago|reply

I've had terrible luck getting LLMs to make me feel more productive.

Copilot is very good at breaking my flow and all of the agent based systems I have tried have been disappointing at following incredibly simple instructions.

Coding is much easier and faster than writing instructions in English so it is hard to justify anything i have seen so far as a time saver.

[+] philjohn|1 year ago|reply

The biggest boon I've found is writing tests - especially when you've got lots of mocks to setup, takes away that boilerplate overhead and lets you focus on the meat of the test.

And when you name your test cases in a common pattern such as "MethodName_ExpectedBehavior_StateUnderTest" the LLM is able to figure it out about 80% of the time.

Then the other 20% of the time I'll make a couple of corrections, but it's definitely sped me up by a low double digit percentage ... when writing tests.

When writing code, it seems to get in the way more often than not, so I mostly don't use it - but then again, a lot of what I'm doing isn't boilerplate CRUD code.

[+] havaloc|1 year ago|reply

I write plain jane PHP/MySQL crud apps that people love for work, including a fitness center membership system.

Writing a new view used to take 5-10 minutes but now I can do it in 30 seconds. Since it's the most basic PHP/MySql imaginable it works very well, none of those frameworks to confuse the LLM or suck up the context window.

The point is I guess that I can do it the old fashioned way because I know how, but I don't have to, I can tell ChatGPT exactly what I want, and how I want it.

[+] e12e|1 year ago|reply

I'd be very interested to see a transcript or two of such a 30 second interaction, if you have any you could share?

[+] lfsh|1 year ago|reply

As search engine LLMs are nice. But for code generation they are not. Everytime it generates code there are small bugs that I don't notice directly but will bite me later.

For example a peace of code with a foreach loop that uses the collection name inside the loop instead of the item name.

Or a very nice looking peace of code but with a method call that does not exist in the used library.

I think the weakness of AI/LMMs is that it outputs probabilities. If the code you request is very common than it will probably generate good code. But that's about it. It can not reason about code (it maybe can 'reason' about the probability of the generated answer).

[+] unknown|1 year ago|reply

[deleted]

[+] dvh|1 year ago|reply

I stopped using stack overflow altogether. It require to write very careful question not to get removed, with LLM I write 1 sentence, then few to narrow it down as needed. It could easily be 1 minute llm vs. writing SO post for 20 minutes and waiting 30 minutes for response. It also saves googling time because googling query often must be more generic to be effective so I then have to spend more time adjusting found solution, llm often gives specific answer.

The moment I realized llm are better was when I needed to do something with screen coordinates of point clouds in three.js and my searches lead nowhere, doing it myself would take me 1 or 2 hours, the llm got correct working code on first try.

[+] rectang|1 year ago|reply

I have not stopped referencing Stack Overflow (though I have almost never asked questions there). I still find it more helpful for exploring approaches and ideas than an LLM. An LLM is most helpful when I definitely know what I want, I just don't know the specific incantation.

[+] floppiplopp|1 year ago|reply

I use the jetbrains product line for my professional work. They now come with a AI code completion assistant, which will 50-80% of the time, depending on the type of project, suggest something wrong, which I either have to spend energy to evaluate and then ignore. The rare cases where it does suggest something useful don't make up for the time an energy wasted having to deal with the completion. AI in this case is detrimental to productivity and attention to the code. It's more useless than useful.

[+] patrick451|1 year ago|reply

I never use them to generate code for languages that I know well. Google has started trying to answer my programming related searches with LLM results. Half the time it points me a useful direction, the other half it's just dead wrong.

I have found them pretty helpful writing sql. But I don't really know sql very well and I'd imagine that somebody who does could write what I need in far less time that it takes me with the LLM. While the LLM helps finish my sql task faster, the downside is that I'm not really learning it in the same way I would if I had to actually bang my head against the wall and understand the docs. In the long run, I'd be better off without it.

[+] arjie|1 year ago|reply

Well, the most concrete example of this is Pieter Levels and his new flying around in cyberspace video game where he's making $60k MRR on. It's a concrete thing that he wouldn't have been able to build otherwise on that cadence.

97 comments