I think the dichotomy you see with how positive people are about ai has almost entirely to do with the kind of questions they ask.
That seems obvious, but a consequence of that is that people who are sceptical of ai (like me) only use it when they've exhausted other resources (like google). You ask very specific questions where not a lot of documentation is available and inevetably even o3 ends up being pretty useless.
Conversely there's people who love ai and use it for everything, and since the majority of the stuff they ask about is fairly simple and well documented (eg "Write me some typescript"), they rarely have a negative experience.
- Some people simply ask a lot more questions than others (this ignores whether they like or dislike AI), i.e. some people simply prefer to find things out more by themselves, and thus also use other resources like Google or Stack Overflow as a last resort. So their questions to an AI will likely be more complicated, because they already found out the easy parts by themselves.
- If I have to make the effort to explain to the AI in a sufficiently exhaustive way what I need (which I often have to do), I expect the answers of the AI to be really good. If it isn't, having explained my problem to the AI was simply a waste of time.
I don't think that dichotomy is true at all, at least not with experienced software people.
Many folks I know are skeptical of the hype, or maybe full on anti/distrustful, due to reasons I think are valid. But many of those same people have tried llm tools, maybe chatgpt or copilot or cursor, and recognize the value even w/ huge misgivings. Some of have gone further with tools like claude code and seen the real potential there, quite a step beyond fancy auto-complete or just-in-time agents...but even there you can end up in rabbit-holes and drowning in horrible design.
In your incredibly reductive scale, I'm closer to 'love' than 'skeptical', but I'm often much of both sides. But I'd never write a prompt like 'write me some typescript' for any real work, or honestly anything close to that, unless its just for memes or demonstrations.
But no-one who programs for a living uses prompts like that, at least not for real work. That is just silly talk.
I think you touched on an important aspect, but did not explore it further.
If we accept that AI is a tool, then then problem is the nature of the tool as it will vary heavily from individual to individual. This partially accounts for the ridiculous differences from self reported accounts of people, who use it on a regular basis.
And then, there is a possibility that my questions are not that unusual and/or are well documented ( quite possible ) so my perception of the usefulness of those answers is skewed.
My recent interaction with o4 was pretty decent on a very new ( by industry standards ) development and while documentation for it exists, it is a swirling vortex of insanity from where I sit. I was actually amazed to see how easily 4o saw some of those discrepancies and listed those to me along with likely pitfalls that may come with it. We will be able to find if that prediction holds v.soon.
Well, I use it before google, since it in general summarizes webpages and removes the ads. Quite handy.
It’s also very useful to check if you understand something correctly. And for programming specifically I found it really useful to help naming stuff (which tends to be hard not in the least place because it’s subjective).
> You ask very specific questions where not a lot of documentation is available and inevetably even o3 ends up being pretty useless.
You have any example questions where o3 failed to be helpful?
I use it pretty similarly to you, only resorting to it to unblock myself basically, otherwise I'm mostly the one doing the actual work, LLMs help with specific functions or specific blockers, or exploring new "spaces". But almost all the times I've gotten stuck, o3 (and o3-pro mode) managed to unstuck me, once I've figured out the right way to ask the question, even when my own searching and reading didn't help.
It's kind of true. I only use it for simple stuff that I don't have time for. For example, how to write a simple diagram in tikz. The Ai does the simple and busywork of providing a good enough approximation which I can tweak and get what I want.
For hard questions, I prefer to use my own skills, because AI often regurgitates what I'm already aware. I still ask AI in the off-chance it comes up with something cool, but most often, I have to do it myself.
What bothers me more than any of this particular discussion is that we seem to be incapable of determining programmer productivity in a meaningful way since my debut as a programmer 40 years ago.
As others probably have experienced, I can only add that I am doing coding now I would have kicked down the road if I did not have LLM assistance.
Example: using LeafletJS — not hard, but I didn't want to have to search all over to figure out how to use it.
Example: other web page development requiring dropping image files, complicated scrolling, split-views, etc.
In short, there are projects I have put off in the past but eagerly begin now that LLMs are there to guide me. It's difficult to compare times and productivity in cases like that.
This is pretty similar to my own experience using LLMs as a tool.
When I'm working with platforms/languages/frameworks I am already deeply familiar with I don't think they save me much time at all. When I've tried to use them in this context they seem to save me a bunch of time in some situations, but also cost me a bunch of time in others resulting in basically a wash as far as time saved goes.
And for me a wash isn't worth the long-term cost of losing touch with the code by not being the one to have crafted it.
But when it comes to environments I'm not intimately familiar with they can provide a very easy on-ramp that is a much more pleasant experience than trying to figure things out through often iffy technical documentation or code samples.
Leaflet doc is single page document with examples you can copy-paste. There is page navogation at the top. Also ctrl/cmd+f and keyword seems quicker than writing the prompt.
> To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.
So it's a small sample size of 16 developers. And it sounds like different tasks were (randomly) assigned to the no-AI and with-AI groups - so the control group doesn't have the same tasks as the experimental group. I think this could lead to some pretty noisy data.
Interestingly - small sample size isn't in the list of objections that the auther includes under "Addressing Every Objection You Thought Of, And Some You Didn’t".
I do think it's an interesting study. But would want to see if the results could be reproduced before reading into it too much.
I think the productivity gains most people rave about are stuff like, I wanted to do X which isn't hard if you are experienced with library Y and library Y is pretty popular and the LLM did it perfectly first try!
I think that's where you get 10-20x. When you're working on niche stuff it's either not gonna work or work poorly.
For example right now I need to figure out why an ffmpeg filter doesn't do X thing smoothly, even though the C code is tiny for the filter and it's self contained.. Gemini refuses to add comments to the code. It just apologizes for not being able to add comments to 150 lines of code lol.
However for building an ffmpeg pipeline in python I was dumbfounded how fast I was prototyping stuff and building fairly complex filter chains which if I had to do by hand just by reading the docs it would've taken me a whole lot more time, effort and frustration but was a joy to figure out with Gemini.
So going back to the study, IMO it's flawed because by definition working on new features for open source projects wouldn't be the bread and butter of LLMs however most people aren't working on stuff like this, they're rewriting the same code that 10000 other people have written but with their own tiny little twist or whatever.
I think this for me is the most worrying: "You can see that for AI Allowed tasks, developers spent less time researching and writing code".
My analogy to this is seeing people spend time trying to figure out how to change colors, draw shapes in powerpoint, rather than focus on the content and presentation. So here, we have developers now focusing their efforts on correcting the AI output, rather than doing the research and improving their ability to deliver code in the future.
I find I’m most likely to use an LLM to generate code in certain specific scenarios: (i) times I’m suffering from “writer’s block” or “having trouble getting started”; (ii) a language or framework I don’t normally use; (iii) feeling tired/burnt out/demotivated
When I’m in the “zone” I wouldn’t go near an LLM, but when I’ve fallen out of the “zone” they can be useful tools in getting me back into it, or just finishing that one extra thing before signing off for the day
I think the right answer to “does LLM use help or hinder developer productivity” is “it depends on how you use them”
It can get over some mental blocks, having some code to look at can start the idea process even it’s wrong (just like for writing). I don’t think it’s bad, like I don’t think writing throw away code for prototyping is a bad way to start a project that you aren’t sure how to tackle. Waterfall (lots of research and design up front) is still not going to work even if you forgo AI.
One thing I find frustrating with these conversations is the _strict_ focus on single-task productivity.
Arguably, on a single coding task, I don't really move that much faster. However, I have much, much more brain capacity left both while coding and when I'm done coding.
This has two knock on effects:
1. Most simply, I'm productive for longer. Since LLMs are doing a lot of the heavy lifting, my brain doesn't have to constantly think. This is especially important in time periods where I'd previously have too little mental energy to think deeply about code.
2. I can do other things while coding. Well, right now, Cursor is cycling on a simple task while I type this. Most days, though, I'm responding to customers, working on documentation/planning, or doing some other non-coding task that's critical to my workflows. This is actually where I find my biggest productivity gains. Instead of coding THEN X, I can now do coding WITH X.
Context shifting while trying to code seems like a bad idea to me
Maybe you're some incredible multi -tasking genius able to change tasks rapidly without losing any of the details, but I suspect if most people tried this workflow they would produce worse code and also whatever their other output is would be low quality too
The article brushed aside devs being terrible at estimates, but I dunno.
I'm a frontend guy, been using Claude Code for a couple of weeks now. It's been able to speed up some boilerplate, it's sped up a lot of "naming is hard" conversations I like to have (but my coworkers probably don't, lol), it's enabled me to do a lot more stuff in my most recent project.
But for a task or two I suspect that it has slowed me down. If I'm unable to articulate the problem well enough and the problem is hard enough you can go in circles for awhile. And I think the nature of "the right answer is just around the corner" makes it hard to timebox or find a specific point where you say "yup, time to ditch this and do it the old-fashioned way". There is a bit of a slot-machine effect here.
> But for a task or two I suspect that it has slowed me down
Likely more, as it takes longer for you to activate your brain when your first thought is to ask an LLM rather than solve it yourself. Its like people reaching for a calculator to do 4+5, that doesn't make you faster or more accurate.
LLMs make me 10-20x more productive in frontend work which I barely do.
But when it comes to low-level stuff (C/C++) I personally don't find it too useful. it just replaces my need to search stackoverflow.
edit: should have mentioned the low-level stuff I work on is mature code and a lot of times novel.
This is good if front end is something you just need to get through. It's terrible if your work is moving to involve a lot of frontend - you'll never pick up the skills yourself
As the fullstacker with a roughly 65/35 split BE/FE on the team who has to review this kinda stuff on the daily, there's nothing I dread more than a backender writing FE tickets and vice versa.
Just last week I had to review some monstrosity of a FE ticket written by one of our backenders, with the comment of "it's 90% there, should be good to takeover". I had to throw out pretty much everything and rewrite it from scratch. My solution was like 150 lines modified, whereas the monstrous output of the AI was non-functional, ugly, a performance nightmare and around 800 lines, with extremely unhelpful and generic commit messages to the tune of "Made things great!!1!1!!".
I can't even really blame them, the C-level craze and zeal for the AI shit is such that if you're not doing crap like this you get scrutinized and PIP'd.
At least frontenders usually have some humility and will tell you they have no clue if it's a good solution or not, while BEnders are always for some reason extremely dismissive of FE work (as can be seen in this very thread). It's truly baffling to me
They averaged producing 47% more code on the AI tasks, but took only 20% more time. The report here biases over these considerations, but I’m left wondering: was the extra code superfluous or did this produce better structure / managed debt better? If that extra 47% of code translates to lower debt and more consistent throughput over the long term, I might take it, given how crushed projects get from debt. Anyway, it’s all hyperbole because there are massive statistical differences in the outcomes but no measures as to what they mean, but I’m sure they have meaning. That meaning matters a ton.
> They averaged producing 47% more code on the AI tasks, but took only 20% more time. The report here biases over these considerations, but I’m left wondering: was the extra code superfluous or did this produce better structure / managed debt better? If that extra 47% of code translates to lower debt and more consistent throughput over the long term, I might take it, given how crushed projects get from debt.
Wouldn't it be the opposite? I'd expect the code would be 47% longer because it's worse and heavier in tech debt (e.g. code repeated in multiple places instead of being factored out into a function).
Honestly my experience from using AI to code (primarily claude sonnet) is that that "extra 47%" is probably itself mostly tech debt. Places where the AI repeated itself instead of using a loop. Places where the AI wrote tests that don't actually test anything. Places where the AI failed to produce a simple abstraction and instead just kept doing the same thing by hand. Etc.
AI isn't very good at being concise, in my experience. To the point of producing worse code. Which is a strange change from humans who might just have a habit of being too concise, but not by the same degree.
All source code is technical debt. If you increase the amount of code, you increase the amount of debt. It's impossible to reduce debt with more code. The only way to reduce debt is by reducing code.
(and note that I'm not measuring code in bytes here; switching to single-character variable names would not reduce debt. I'm measuring it in statements, expressions, instructions; reducing those without reducing functionality decreases debt)
Now do a study that specifically gauges how useful an LLM (including smart tab completion) is for a frontend dev working in react/next/tailwind on everyday Jira tickets.
These were maintainers of large open source projects. It's all relative. It's clearly providing massive gains for some and not as much for others. It should follow that it's benefit to you depends on who you are and what you are working on.
It's a very well controlled study about... what the study claims to do. Yes, they didn't study a different thing, for _many_ reasons. Yes, we shouldn't haphazardly extrapolate to other parts of Engineering. But it looks like it's a good study nonetheless.
There are some very good findings though, like how the devs thought they were sped up but they were actually slowed down.
React and tailwind already made lot of tradeoffs to make it more ergonomic for developers. One would expect that LLMs could unlock lean and faster stack instead.
As a backend dev who owns a few internal crappy frontends, LLMs have been the best thing ever. Code quality isn't the top priority, I just need to plumb some data to an internal page at BigCorp.
Perhaps is difficult to measure personal productivity in programming, but we can measure that we will run more slowly with 10 kg. in our backpack. I propose this procedure: The SWE selects 10 tasks and guesses some measure of their complexity (time to finish them) and then he randomly select 5 to be done with AI and the rest without. He performs them and finally calculates a deviation D. The deviation D = D_0 - D_1 where D_i = sum (real_time/guessed_time - 1), where D_0 is using AI and D_1 is without AI, the sign and magnitude of D measure respectively if the use of AI is beneficial or detrimental and the impact of using AI. Also, clipping individuals addends to be in the interval [-0.5,0.5] should avoid one bad guess dominating the estimation. Sorry if this is a trivial ideal but it is feasible and intuitively should provide useful information if the tasks are taken among the ones in which each initial guessing has small deviation. A filter should be applied to tasks in which scaffolding by AI surpass a certain relative threshold in case we are interested in generalizing our results to tasks in which scaffolding is not dominating time.
It could happen that the impact of using AI depends of the task at hand, the capability of the SWE to pair programming with it, and of the LLM used, to such an extend that those factors were bigger that the average effect on a bag of tasks, in this case the large deviation from the mean makes any one parameter estimation void of useful information.
Really a great piece of work. At the opposite of the usual studies posted here.
With the headline we can easily guess that the study should be flawed, with the sample not representative, or developers not expert enough with the AI or so, and then, they give a very well done list of all valable objections with arguments about why they don't think that contradict the study.
That replied to all the questions I could have had in the end.
What if this is true? And then we as a developer community are focused on the wrong thing to increase productivity?
Like what if by focusing on LLMs for productivity we just reinforce old-bad habits, and get into a local maxima... And even worse, what if being stuck with current so-so patterns, languages, etc means we don't innovate in language design, tooling, or other areas that might actually be productivity wins?
imagine having interstate highways built in one night you wake up and you have all these highways and roads and everyone is confused what they are and how to use them. using llm is the opposite of boiling frogs because you're not the leader writing, you're just suggesting... i just realized i might not know what im talking about.
We were stuck near local maxima since before LLM's came on the scene. I figure the same concentration of innovators are gonna innovate, now LLM assisted, and the same concentration of best-practice folk are gonna best-practice--now LLM assisted. Local maxima might get sticker, but greener pastures will be found more quickly than ever.
AI could make me more productive, I know that for a fact. But, I don't want to be more productive because the tasks that could be automated with AI are those I find enjoyable. Not always in an intellectual sense, but in a meditative sense. And if I automated those away, I think I would become less human.
I find LLMs are decent at regurgitating boilerplate. Basically the same kind of stuff you could google then copy-paste... AI chatbots, now that they have web access, are also good at going over documentation and save you a little time searching through the docs yourself.
They're not great at business logic though, especially if you're doing anything remotely novel. Which is the difficult part of programming anyway.
But yeah, to the average corporate programmer who needs to recreate the same internal business tool that every other company has anyway, it probably saves a lot of time.
I have never found a measure of programmer productivity that makes sense to me, but I can say that LLM coding tools are way more distracting to me than they are worth. They constantly guess at what I may type next, are often wrong, and pop in with suggestions breaking my mental flow and making me switch from the mindset of coding to the mindset of reviewing code.
The more I used it, the easier it became to skip over things I should have thought through myself. But looking back, the results weren’t always faster or better.
Now I prefer to treat AI as a kind of challenger. It helps reveal the parts I haven't truly understood, rather than just speeding things up.
[+] [-] Fraterkes|8 months ago|reply
That seems obvious, but a consequence of that is that people who are sceptical of ai (like me) only use it when they've exhausted other resources (like google). You ask very specific questions where not a lot of documentation is available and inevetably even o3 ends up being pretty useless.
Conversely there's people who love ai and use it for everything, and since the majority of the stuff they ask about is fairly simple and well documented (eg "Write me some typescript"), they rarely have a negative experience.
[+] [-] aleph_minus_one|8 months ago|reply
- Some people simply ask a lot more questions than others (this ignores whether they like or dislike AI), i.e. some people simply prefer to find things out more by themselves, and thus also use other resources like Google or Stack Overflow as a last resort. So their questions to an AI will likely be more complicated, because they already found out the easy parts by themselves.
- If I have to make the effort to explain to the AI in a sufficiently exhaustive way what I need (which I often have to do), I expect the answers of the AI to be really good. If it isn't, having explained my problem to the AI was simply a waste of time.
[+] [-] rsanheim|8 months ago|reply
Many folks I know are skeptical of the hype, or maybe full on anti/distrustful, due to reasons I think are valid. But many of those same people have tried llm tools, maybe chatgpt or copilot or cursor, and recognize the value even w/ huge misgivings. Some of have gone further with tools like claude code and seen the real potential there, quite a step beyond fancy auto-complete or just-in-time agents...but even there you can end up in rabbit-holes and drowning in horrible design.
In your incredibly reductive scale, I'm closer to 'love' than 'skeptical', but I'm often much of both sides. But I'd never write a prompt like 'write me some typescript' for any real work, or honestly anything close to that, unless its just for memes or demonstrations.
But no-one who programs for a living uses prompts like that, at least not for real work. That is just silly talk.
[+] [-] A4ET8a8uTh0_v2|8 months ago|reply
If we accept that AI is a tool, then then problem is the nature of the tool as it will vary heavily from individual to individual. This partially accounts for the ridiculous differences from self reported accounts of people, who use it on a regular basis.
And then, there is a possibility that my questions are not that unusual and/or are well documented ( quite possible ) so my perception of the usefulness of those answers is skewed.
My recent interaction with o4 was pretty decent on a very new ( by industry standards ) development and while documentation for it exists, it is a swirling vortex of insanity from where I sit. I was actually amazed to see how easily 4o saw some of those discrepancies and listed those to me along with likely pitfalls that may come with it. We will be able to find if that prediction holds v.soon.
What I am saying is that it has its uses.
[+] [-] marhee|8 months ago|reply
[+] [-] diggan|8 months ago|reply
You have any example questions where o3 failed to be helpful?
I use it pretty similarly to you, only resorting to it to unblock myself basically, otherwise I'm mostly the one doing the actual work, LLMs help with specific functions or specific blockers, or exploring new "spaces". But almost all the times I've gotten stuck, o3 (and o3-pro mode) managed to unstuck me, once I've figured out the right way to ask the question, even when my own searching and reading didn't help.
[+] [-] whatagreatboy|8 months ago|reply
For hard questions, I prefer to use my own skills, because AI often regurgitates what I'm already aware. I still ask AI in the off-chance it comes up with something cool, but most often, I have to do it myself.
[+] [-] tomcam|8 months ago|reply
[+] [-] calrain|8 months ago|reply
How I measure performance is how many features I can implement in a given period of time.
It's nice that people have done studies and have opinions, but for me, it's 10x to 20x better.
[+] [-] JKCalhoun|8 months ago|reply
Example: using LeafletJS — not hard, but I didn't want to have to search all over to figure out how to use it.
Example: other web page development requiring dropping image files, complicated scrolling, split-views, etc.
In short, there are projects I have put off in the past but eagerly begin now that LLMs are there to guide me. It's difficult to compare times and productivity in cases like that.
[+] [-] georgemcbay|8 months ago|reply
When I'm working with platforms/languages/frameworks I am already deeply familiar with I don't think they save me much time at all. When I've tried to use them in this context they seem to save me a bunch of time in some situations, but also cost me a bunch of time in others resulting in basically a wash as far as time saved goes.
And for me a wash isn't worth the long-term cost of losing touch with the code by not being the one to have crafted it.
But when it comes to environments I'm not intimately familiar with they can provide a very easy on-ramp that is a much more pleasant experience than trying to figure things out through often iffy technical documentation or code samples.
[+] [-] timeon|8 months ago|reply
Leaflet doc is single page document with examples you can copy-paste. There is page navogation at the top. Also ctrl/cmd+f and keyword seems quicker than writing the prompt.
[+] [-] freetime2|8 months ago|reply
> To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.
So it's a small sample size of 16 developers. And it sounds like different tasks were (randomly) assigned to the no-AI and with-AI groups - so the control group doesn't have the same tasks as the experimental group. I think this could lead to some pretty noisy data.
Interestingly - small sample size isn't in the list of objections that the auther includes under "Addressing Every Objection You Thought Of, And Some You Didn’t".
I do think it's an interesting study. But would want to see if the results could be reproduced before reading into it too much.
[+] [-] jack_pp|8 months ago|reply
I think that's where you get 10-20x. When you're working on niche stuff it's either not gonna work or work poorly.
For example right now I need to figure out why an ffmpeg filter doesn't do X thing smoothly, even though the C code is tiny for the filter and it's self contained.. Gemini refuses to add comments to the code. It just apologizes for not being able to add comments to 150 lines of code lol.
However for building an ffmpeg pipeline in python I was dumbfounded how fast I was prototyping stuff and building fairly complex filter chains which if I had to do by hand just by reading the docs it would've taken me a whole lot more time, effort and frustration but was a joy to figure out with Gemini.
So going back to the study, IMO it's flawed because by definition working on new features for open source projects wouldn't be the bread and butter of LLMs however most people aren't working on stuff like this, they're rewriting the same code that 10000 other people have written but with their own tiny little twist or whatever.
[+] [-] Tainnor|8 months ago|reply
[+] [-] xarope|8 months ago|reply
My analogy to this is seeing people spend time trying to figure out how to change colors, draw shapes in powerpoint, rather than focus on the content and presentation. So here, we have developers now focusing their efforts on correcting the AI output, rather than doing the research and improving their ability to deliver code in the future.
Hmm...
[+] [-] skissane|8 months ago|reply
When I’m in the “zone” I wouldn’t go near an LLM, but when I’ve fallen out of the “zone” they can be useful tools in getting me back into it, or just finishing that one extra thing before signing off for the day
I think the right answer to “does LLM use help or hinder developer productivity” is “it depends on how you use them”
[+] [-] seanmcdirmid|8 months ago|reply
[+] [-] hammyhavoc|8 months ago|reply
[+] [-] SkyPuncher|8 months ago|reply
Arguably, on a single coding task, I don't really move that much faster. However, I have much, much more brain capacity left both while coding and when I'm done coding.
This has two knock on effects:
1. Most simply, I'm productive for longer. Since LLMs are doing a lot of the heavy lifting, my brain doesn't have to constantly think. This is especially important in time periods where I'd previously have too little mental energy to think deeply about code.
2. I can do other things while coding. Well, right now, Cursor is cycling on a simple task while I type this. Most days, though, I'm responding to customers, working on documentation/planning, or doing some other non-coding task that's critical to my workflows. This is actually where I find my biggest productivity gains. Instead of coding THEN X, I can now do coding WITH X.
[+] [-] bluefirebrand|8 months ago|reply
Context shifting while trying to code seems like a bad idea to me
Maybe you're some incredible multi -tasking genius able to change tasks rapidly without losing any of the details, but I suspect if most people tried this workflow they would produce worse code and also whatever their other output is would be low quality too
[+] [-] Brendinooo|8 months ago|reply
I'm a frontend guy, been using Claude Code for a couple of weeks now. It's been able to speed up some boilerplate, it's sped up a lot of "naming is hard" conversations I like to have (but my coworkers probably don't, lol), it's enabled me to do a lot more stuff in my most recent project.
But for a task or two I suspect that it has slowed me down. If I'm unable to articulate the problem well enough and the problem is hard enough you can go in circles for awhile. And I think the nature of "the right answer is just around the corner" makes it hard to timebox or find a specific point where you say "yup, time to ditch this and do it the old-fashioned way". There is a bit of a slot-machine effect here.
[+] [-] Jensson|8 months ago|reply
Likely more, as it takes longer for you to activate your brain when your first thought is to ask an LLM rather than solve it yourself. Its like people reaching for a calculator to do 4+5, that doesn't make you faster or more accurate.
[+] [-] latenightcoding|8 months ago|reply
edit: should have mentioned the low-level stuff I work on is mature code and a lot of times novel.
[+] [-] AstroBen|8 months ago|reply
[+] [-] sensanaty|8 months ago|reply
Just last week I had to review some monstrosity of a FE ticket written by one of our backenders, with the comment of "it's 90% there, should be good to takeover". I had to throw out pretty much everything and rewrite it from scratch. My solution was like 150 lines modified, whereas the monstrous output of the AI was non-functional, ugly, a performance nightmare and around 800 lines, with extremely unhelpful and generic commit messages to the tune of "Made things great!!1!1!!".
I can't even really blame them, the C-level craze and zeal for the AI shit is such that if you're not doing crap like this you get scrutinized and PIP'd.
At least frontenders usually have some humility and will tell you they have no clue if it's a good solution or not, while BEnders are always for some reason extremely dismissive of FE work (as can be seen in this very thread). It's truly baffling to me
[+] [-] raggi|8 months ago|reply
[+] [-] lmm|8 months ago|reply
Wouldn't it be the opposite? I'd expect the code would be 47% longer because it's worse and heavier in tech debt (e.g. code repeated in multiple places instead of being factored out into a function).
[+] [-] gpm|8 months ago|reply
AI isn't very good at being concise, in my experience. To the point of producing worse code. Which is a strange change from humans who might just have a habit of being too concise, but not by the same degree.
[+] [-] philbo|8 months ago|reply
All source code is technical debt. If you increase the amount of code, you increase the amount of debt. It's impossible to reduce debt with more code. The only way to reduce debt is by reducing code.
(and note that I'm not measuring code in bytes here; switching to single-character variable names would not reduce debt. I'm measuring it in statements, expressions, instructions; reducing those without reducing functionality decreases debt)
[+] [-] kylecazar|8 months ago|reply
These were maintainers of large open source projects. It's all relative. It's clearly providing massive gains for some and not as much for others. It should follow that it's benefit to you depends on who you are and what you are working on.
It isn't black and white.
[+] [-] franciscop|8 months ago|reply
There are some very good findings though, like how the devs thought they were sped up but they were actually slowed down.
[+] [-] timeon|8 months ago|reply
[+] [-] cheeze|8 months ago|reply
[+] [-] sheepfacts|8 months ago|reply
It could happen that the impact of using AI depends of the task at hand, the capability of the SWE to pair programming with it, and of the LLM used, to such an extend that those factors were bigger that the average effect on a bag of tasks, in this case the large deviation from the mean makes any one parameter estimation void of useful information.
[+] [-] didibus|8 months ago|reply
[+] [-] budududuroiu|8 months ago|reply
I thought it was the model, but then I realised, v0 is carried by the shadcn UI library, not the intelligence of the model
[+] [-] greatgib|8 months ago|reply
With the headline we can easily guess that the study should be flawed, with the sample not representative, or developers not expert enough with the AI or so, and then, they give a very well done list of all valable objections with arguments about why they don't think that contradict the study.
That replied to all the questions I could have had in the end.
[+] [-] softwaredoug|8 months ago|reply
Like what if by focusing on LLMs for productivity we just reinforce old-bad habits, and get into a local maxima... And even worse, what if being stuck with current so-so patterns, languages, etc means we don't innovate in language design, tooling, or other areas that might actually be productivity wins?
[+] [-] journal|8 months ago|reply
[+] [-] __MatrixMan__|8 months ago|reply
I expect it'll balance.
[+] [-] vouaobrasil|8 months ago|reply
[+] [-] dismalaf|8 months ago|reply
They're not great at business logic though, especially if you're doing anything remotely novel. Which is the difficult part of programming anyway.
But yeah, to the average corporate programmer who needs to recreate the same internal business tool that every other company has anyway, it probably saves a lot of time.
[+] [-] bhaktatejas922|8 months ago|reply
The bar to beat is time it takes for human intelligence + Cursor Tab to complete a task, but in practice the bar becomes "feeling" like its faster
[+] [-] _heimdall|8 months ago|reply
[+] [-] Amaury-El|8 months ago|reply