top | item 44879475

(no title)

wiremine | 6 months ago

> Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative.

> For me it’s meant a huge increase in productivity, at least 3X.

How do we reconcile these two comments? I think that's a core question of the industry right now.

My take, as a CTO, is this: we're giving people new tools, and very little training on the techniques that make those tools effective.

It's sort of like we're dropping trucks and airplanes on a generation that only knows walking and bicycles.

If you've never driven a truck before, you're going to crash a few times. Then it's easy to say "See, I told you, this new fangled truck is rubbish."

Those who practice with the truck are going to get the hang of it, and figure out two things:

1. How to drive the truck effectively, and

2. When NOT to use the truck... when talking or the bike is actually the better way to go.

We need to shift the conversation to techniques, and away from the tools. Until we do that, we're going to be forever comparing apples to oranges and talking around each other.

discuss

weego|6 months ago

In a similar role and place with this.

My biggest take so far: If you're a disciplined coder that can handle 20% of an entire project's (project being a bug through to an entire app) time being used on research, planning and breaking those plans into phases and tasks, then augmenting your workflow with AI appears to be to have large gains in productivity.

Even then you need to learn a new version of explaining it 'out loud' to get proper results.

If you're more inclined to dive in and plan as you go, and store the scope of the plan in your head because "it's easier that way" then AI 'help' will just fundamentally end up in a mess of frustration.

t0mas88|6 months ago

For me it has a big positive impact on two sides of the spectrum and not so much in the middle.

One end is larger complex new features where I spend a few days thinking about how to approach it. Usually most thought goes into how to do something complex with good performance that spans a few apps/services. I write a half page high level plan description, a set of bullets for gotchas and how to deal with them and list normal requirements. Then let Claude Code run with that. If the input is good you'll get a 90% version and then you can refactor some things or give it feedback on how to do some things more cleanly.

The other end of the spectrum is "build this simple screen using this API, like these 5 other examples". It does those well because it's almost advanced autocomplete mimicking your other code.

Where it doesn't do well for me is in the middle between those two. Some complexity, not a big plan and not simple enough to just repeat something existing. For those things it makes a mess or you end up writing a lot of instructions/prompt abs could have just done it yourself.

cmdli|6 months ago

My experience has been entirely the opposite as an IC. If I spend the time to delve into the code base to the point that I understand how it works, AI just serves as a mild improvement in writing code as opposed to implementing it normally, saving me maybe 5 minutes on a 2 hour task.

On the other hand, I’ve found success when I have no idea how to do something and tell the AI to do it. In that case, the AI usually does the wrong thing but it can oftentimes reveal to me the methods used in the rest of the codebase.

gwd|6 months ago

> How do we reconcile these two comments? I think that's a core question of the industry right now.

The question is, for those people who feel like things are going faster, what's the actual velocity?

A month ago I showed it a basic query of one resource I'd rewritten to use a "query builder" API. Then I showed it the "legacy" query of another resource, and asked it to do something similar. It managed to get very close on the first try, and with only a few more hours of tweaking and testing managed to get a reasonably thorough test suite to pass. I'm sure that took half the time it would have taken me to do it by hand.

Fast forward to this week, when I ran across some strange bugs, and had to spend a day or two digging into the code again, and do some major revision. Pretty sure those bugs wouldn't have happened if I'd written the code myself; but even though I reviewed the code, they went under the radar, because I hadn't really understood the code as well as I thought I had.

So was I faster overall? Or did I just offload some of the work to myself at an unpredictable point in the future? I don't "vibe code": I keep tight reign on the tool and review everything it's doing.

Gigachad|6 months ago

Pretty much. We are in an era of vibe efficiency.

If programmers really did get 3x faster. Why has software not improved any faster than it always has been.

sarmasamosarma|6 months ago

[deleted]

delegate|6 months ago

Easy. You're 3x more productive for a while and then you burn yourself out.

Or lose control of the codebase, which you no longer understand after weeks of vibing (since we can only think and accumulate knowledge at 1x).

Sometimes the easy way out is throwing a week of generated code away and starting over.

So that 3x doesn't come for free at all, besides API costs, there's the cost of quickly accumulating tech debt which you have to pay if this is a long term project.

For prototypes, it's still amazing.

brulard|6 months ago

You conflate efficient usage of AI with "vibing". Code can be written by AI and still follow the agreed-upon structures and rules and still can and should be thoroughly reviewed. The 3x absolutely does not come for free. But the price may have been paid in advance by learning how to use those tools best.

I agree the vibe-coding mentality is going to be a major problem. But aren't all tools used well and used badly?

Aeolun|6 months ago

> Or lose control of the codebase, which you no longer understand after weeks of vibing (since we can only think and accumulate knowledge at 1x).

I recognize this, but at the same time, I’m still better at rmembering the scope of the codebase than Claude is.

If Claude gets a 1M context window, we can start sticking a general overview of the codebase in every single prompt without.

quikoa|6 months ago

It's not just about the programmer and his experience with AI tools. The problem domain and programming language(s) used for a particular project may have a large impact on how effective the AI can be.

vitaflo|6 months ago

But even on the same project with the same tools the general way a dev derives satisfaction from their work can play a big role. Some devs derive satisfaction from getting work done and care less about the code as long as it works. Others derive satisfaction from writing well architected and maintainable code. One can guess the reactions to how LLM's fit into their day to day lives for each.

wiremine|6 months ago

> The problem domain and programming language(s) used for a particular project may have a large impact on how effective the AI can be.

100%. Again, if we only focus on things like context windows, we're missing the important details.

jeremy_k|6 months ago

Well put. It really does come down to nuance. I find Claude is amazing at writing React / Typescript. I mostly let it do it's own thing and skim the results after. I have it write Storybook components so I can visually confirm things look how I want. If something isn't quite right I'll take a look and if I can spot the problem and fix it myself, I'll do that. If I can't quickly spot it, I'll write up a prompt describing what is going on and work through it with AI assistance.

Overall, React / Typescript I heavily let Claude write the code.

The flip side of this is my server code is Ruby on Rails. Claude helps me a lot less here because this is my primary coding background. I also have a certain way I like to write Ruby. In these scenarios I'm usually asking Claude to generate tests for code I've already written and supplying lots of examples in context so the coding style matches. If I ask Claude to write something novel in Ruby I tend to use it as more of a jumping off point. It generates, I read, I refactor to my liking. Claude is still very helpful, but I tend to do more of the code writing for Ruby.

Overall, helpful for Ruby, I still write most of the code.

These are the nuances I've come to find and what works best for my coding patterns. But to your point, if you tell someone "go use Claude" and they have have a preference in how to write Ruby and they see Claude generate a bunch of Ruby they don't like, they'll likely dismiss it as "This isn't useful. It took me longer to rewrite everything than just doing it myself". Which all goes to say, time using the tools whether its Cursor, Claude Code, etc (I use OpenCode) is the biggest key but figuring out how to get over the initial hump is probably the biggest hurdle.

jorvi|6 months ago

It is not really a nuanced take when it compares 'unassisted' coding to using a bicycle and AI-assisted coding with a truck.

I put myself somewhere in the middle in terms of how great I think LLMs are for coding, but anyone that has worked with a colleague that loves LLM coding knows how horrid it is that the team has to comb through and doublecheck their commits.

In that sense it would be equally nuanced to call AI-assisted development something like "pipe bomb coding". You toss out your code into the branch, and your non-AI'd colleagues have to quickly check if your code is a harmless tube of code or yet another contraption that quickly needs defusing before it blows up in everyone's face.

Of course that is not nuanced either, but you get the point :)

k9294|6 months ago

For this very reason I switched for TS for backend as well. I'm not a big fun of JS but the productivity gain of having shared types between frontend and backend and the Claude code proficiency with TS is immense.

croes|6 months ago

Do you only skim the results or do you audit them at some point to prevent security issues?

troupo|6 months ago

> How do we reconcile these two comments? I think that's a core question of the industry right now.

We don't. Because there's no hard data: https://dmitriid.com/everything-around-llms-is-still-magical...

And when hard data of any kind does start appearing, it may actually point in a different direction: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

> We need to shift the conversation to techniques, and away from the tools.

No, you're asking to shift the conversation to magical incantation which experts claim work.

What we need to do is shift the conversation to measurements

chasd00|6 months ago

One thing to think about is many software devs have a very hard time with code they didn't write. I've seen many devs do a lot of work to change code to something equivalent (even with respect to performance and readability) only because it's not the way they would have done it. I could see people having a hard time using what the LLM produced without having to "fix it up" and basically re-write everything.

jama211|6 months ago

Yeah sometimes I feel like a unicorn because I don’t really care about code at all, so long as it conforms to decent standards and does what it needs to do. I honestly believe engineers often overestimate the importance of elegance in code too, to the point of not realising the slow down of a project due to overly perfect code is genuinely not worth it.

unoti|6 months ago

> Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative. > For me it’s meant a huge increase in productivity, at least 3X. > How do we reconcile these two comments? I think that's a core question of the industry right now.

Every success story with AI coding involves giving the agent enough context to succeed on a task that it can see a path to success on. And every story where it fails is a situation where it had not enough context to see a path to success on. Think about what happens with a junior software engineer: you give them a task and they either succeed or fail. If they succeed wildly, you give them a more challenging task. If they fail, you give them more guidance, more coaching, and less challenging tasks with more personal intervention from you to break it down into achievable steps.

As models and tooling becomes more advanced, the place where that balance lies shifts. The trick is to ride that sweet spot of task breakdown and guidance and supervision.

hirako2000|6 months ago

Bold claims.

From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.

In particular when details are provided, in fact.

I find that with solutions likely to be well oiled in the training data, a well formulated set of *basic* requirements often leads to a zero shot, "a" perfectly valid solution. I say "a" solution because there is still this probability (seed factor) that it will not honour part of the demands.

E.g, build a to-do list app for the browser, persist entries into a hashmap, no duplicate, can edit and delete, responsive design.

I never recall seeing an LLM kick off C++ code out of that. But I also don't recall any LLM succeeding in all these requirements, even though there aren't that many.

It may use a hash set, or even a set for persistence because it avoids duplicates out of the box. And it would even use a hash map to show it used a hashmap but as an intermediary data structure. It would be responsive, but the edit/delete buttons may not show, or may not be functional. Saving the edits may look like it worked, but did not.

The comparison with junior developers is pale. Even a mediocre developer can test its and won't pretend that it works if it doesn't even execute. If a develop lies too many times it would lose trust. We forgive these machines because they are just automatons with a label on it "can make mistakes". We have no resorts to make them speak the truth, they lie by design.

troupo|6 months ago

> And every story where it fails is a situation where it had not enough context to see a path to success on.

And you know that because people are actively sharing the projects, code bases, programming languages and approaches they used? Or because your gut feeling is telling you that?

For me, agents failed with enough context, and with not enough context, and succeeded with context, or not enough, and succeeded and failed with and without "guidance and coaching"

worldsayshi|6 months ago

I think it's very much down to which kind of problem you're trying to solve.

If a solution can subtly fail and it is critical that it doesn't, LLM is net negative.

If a solution is easy to verify or if it is enough that it walks like a duck and quacks like one, LLM can be very useful.

I've had examples of both lately. I'm very much both bullish and bearish atm.

abc_lisper|6 months ago

I doubt there is much art to getting LLM work for you, despite all the hoopla. Any competent engineer can figure that much out.

The real dichotomy is this. If you are aware of the tools/APIs and the Domain, you are better off writing the code on your own, except may be shallow changes like refactorings. OTOH, if you are not familiar with the domain/tools, using a LLM gives you a huge legup by preventing you from getting stuck and providing intial momentum.

jama211|6 months ago

I dunno, first time I tried an LLM I was getting so annoyed because I just wanted it to go through a css file and replace all colours with variables defined in root, and it kept missing stuff and spinning and I was getting so frustrated. Then a friend told me I should instead just ask it to write a script which accomplishes that goal, and it did it perfectly in one prompt, then ran it for me, and also wrote another script to check it hadn’t missed any and ran that.

At no point when it was getting f stuck initially did it suggest another approach, or complain that it was outside its context window even though it was.

This is a perfect example of “knowing how to use an LLM” taking it from useless to useful.

badlucklottery|6 months ago

This is my experience as well.

LLM currently produce pretty mediocre code. A lot of that is a "garbage in, garbage out" issue but it's just the current state of things.

If the alternative is noob code or just not doing a task at all, then mediocre is great.

But 90% of the time I'm working in a familiar language/domain so I can grind out better code relatively quickly and do so in a way that's cohesive with nearby code in the codebase. The main use-case I have for AI in that case is writing the trivial unit tests for me.

So it's another "No Silver Bullet" technology where the problem it's fixing isn't the essential problem software engineers are facing.

brulard|6 months ago

I believe there IS much art in LLMs and Agents especially. Maybe you can get like 20% boost quite quickly, but there is so much room to grow it to maybe 500% long term.

sixothree|6 months ago

It might just be me but I feel like it excels with certain languages where other situations it falls flat. Throw a well architected and documented code base in a popular language and you can definitely feel it get I to its groove.

Also giving IT tools to ensure success is just as important. MCPs can sometimes make a world of difference, especially when it needs to search you code base.

dennisy|6 months ago

Also another view is that developers below a certain level get a positive benefit and those above get a negative effect.

This makes sense, as the models are an average of the code out there and some of us are above and below that average.

Sorry btw I do not want to offend anyone who feels they do garner a benefit from LLMs, just wanted to drop in this idea!

smokel|6 months ago

My experience was exactly the opposite.

Experienced developers know when the LLM goes off the rails, and are typically better at finding useful applications. Junior developers on the other hand, can let horrible solutions pass through unchecked.

Then again, LLMs are improving so quickly, that the most recent ones help juniors to learn and understand things better.

rzz3|6 months ago

It’s also really good for me as a very senior engineer with serious ADHD. Sometimes I get very mentally blocked, and telling Claude Code to plan and implement a feature gives me a really valuable starting point and has a way of unblocking me. For me it’s easier to elaborate off of an existing idea or starting point and refactor than start a whole big thing from zero on my own.

parpfish|6 months ago

i don't know if anybody else has experienced this, but one of my biggest time-sucks with cursor is that it doesn't have a way for me to steer it mid-process that i'm aware of.

it'll build something that fails a test, but i know how to fix the problem. i can't jump in a manually fix it or tell it what to do. i just have to watch it churn through the problem and eventually give up and throw away a 90% good solution that i knew how to fix.

ath3nd|6 months ago

That's my anecdotal experience as well! Junior devs struggle with a lot of things:

- syntax

- iteration over an idea

- breaking down the task and verifying each step

Working with a tool like Claude that gets them started quick and iterate the solution together with them helps them tremendously and educate them on best practices in the field.

Contrast that with a seasoned developer with a domain experience, good command of the programming language and knowledge of the best practices and a clear vision of how the things can be implemented. They hardly need any help on those steps where the junior struggled and where the LLMs shine, maybe some quick check on the API, but that's mostly it. That's consistent with the finding of the study https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... that experienced developers' performance suffered when using an LLM.

What I used as a metaphor before to describe this phenomena is training wheels: kids learning how to ride a bike can get the basics with the help and safety of the wheels, but adults that already can ride a bike don't have any use for the training wheels, and can often find restricted by them.

jdgoesmarching|6 months ago

Agreed, and it drives me bonkers when people talk about AI coding as if it represents some a single technique, process, or tool.

Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days.

We don’t even fully agree on the best practices for writing code without AI.

mh-|6 months ago

> Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days.

Older person here: they absolutely did, all over the place in the early 90s. I remember people decrying projects that moved them to computers everywhere I went. Doctors offices, auto mechanics, etc.

Then later, people did the same thing about the Internet (was written with a single word capital I by 2000, having been previously written as two separate words.)

https://i.imgur.com/vApWP6l.png

moregrist|6 months ago

> Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days.

There were gobs of terrible road metaphors that spun out of calling the Internet the “Information Superhighway.”

Gobs and gobs of them. All self-parody to anyone who knew anything.

I hesitate to relate this to anything in the current AI era, but maybe the closest (and in a gallows humor/doomer kind of way) is the amount of exec speak on how many jobs will be replaced.

nhaehnle|6 months ago

I just find it hard to take the 3x claims at face value because actual code generation is only a small part of my job, and so Amdahl's law currently limits any productivity increase from agentic AI to well below 2x for me.

(And I believe I'm fairly typical for my team. While there are more junior folks, it's not that I'm just stuck with powerpoint or something all day. Writing code is rarely the bottleneck.)

So... either their job is really just churning out code (where do these jobs exist, and are there any jobs like this at all that still care about quality?) or the most generous explanation that I can think of is that people are really, really bad at self-evaluations of productivity.

bloomca|6 months ago

> 2. When NOT to use the truck... when talking or the bike is actually the better way to go.

Some people write racing car code, where a truck just doesn't bring much value. Some people go into more uncharted territories, where there are no roads (so the truck will not only slow you down, it will bring a bunch of dead weight).

If the road is straight, AI is wildly good. In fact, it is probably _too_ good; but it can easily miss a turn and it will take a minute to get it on track.

I am curious if we'll able to fine tune LLMs to assist with less known paths.

pesfandiar|6 months ago

Your analogy would be much better with giving workers a work horse with a mind of its own. Trucks come with clear instructions and predictable behaviour.

chasd00|6 months ago

> Your analogy would be much better with giving workers a work horse with a mind of its own.

i think this is a very insightful comment with respect to working with LLMs. If you've ever ridden a horse you don't really tell it to walk, run, turn left, turn right, etc you have to convince it to do those things and not be too aggravating while you're at it. With a truck simple cause and effect applies but with horse it's a negotiation. I feel like working with LLMs is like a negotiation, you have to coax out of it what you're after.

jf22|6 months ago

A couple of weeks isn't enough.

I'm six months in using LLMs to generate 90 of my code and finally understanding the techniques and limitations.

Ianjit|6 months ago

"How do we reconcile these two comments? I think that's a core question of the industry right now."

There is no correlation between developers self assessment of their productivity and their actual productivity.

https://www.youtube.com/watch?v=tbDDYKRFjhk

ath3nd|6 months ago

> How do we reconcile these two comments? I think that's a core question of the industry right now.

The current freshest study focusing on experienced developers showed a net negative in the productivity when using an LLM solution in their flow:

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

My conclusion on this, as an ex VP of Engineering, is that good senior developers find little utility with LLMs and even them to be a nuisance/detriment, while for juniors, they can be godsend, as they help them with syntax and coax the solution out of them.

It's like training wheels to a bike. A toddler might find 3x utility, while a person who actually can ride a bike well will find themselves restricted by training wheels.

jg0r3|6 months ago

Three things I've noticed as a dev whose field involves a lot of niche software development.

1. LLMs seem to benefit 'hacker-type' programmers from my experience. People who tend to approach coding problems in a very "kick the TV from different angles and see if it works" strategy.

2. There seems to be two overgeneralized types of devs in the market right now: Devs who make niche software and devs who make web apps, data pipelines, and other standard industry tools. LLMs are much better at helping with the established tool development at the moment.

3. LLMs are absolute savants at making clean-ish looking surface level tech demos in ~5 minutes, they are masters of selling "themselves" to executives. Moving a demo to a production stack? Eh, results may vary to say the least.

I use LLMs extensively when they make sense for me.

One fascinating thing for me is how different everyone's experience with LLMs is. Obviously there's a lot of noise out there. With AI haters and AI tech bros kind of muddying the waters with extremist takes.

pletnes|6 months ago

Being a consultant / programmer with feet on the ground, eh, hands on the keyboard: some orgs let us use some AI tools, others do not. Some projects are predominantly new code based on recent tech (React); others include maintaining legacy stuff on windows server and proprietary frameworks. AI is great on some tasks, but unavailable or ignorant about others. Some projects have sharp requirements (or at least, have requirements) whereas some require 39 out of 40 hours a week guessing at what the other meat-based intelligences actually want from us.

What «programming» actually entails, differs enormously; so does AI’s relevance.

nabla9|6 months ago

I agree.

I experience a productivity boost, and I believe it’s because I prevent LLMs from making design choices or handling creative tasks. They’re best used as a "code monkey", fill in function bodies once I’ve defined them. I design the data structures, functions, and classes myself. LLMs also help with learning new libraries by providing examples, and they can even write unit tests that I manually check. Importantly, no code I haven’t read and accepted ever gets committed.

Then I see people doing things like "write an app for ....", run, hey it works! WTF?

oceanplexian|6 months ago

It's pretty simple, AI is now political for a lot of people. Some folks have a vested interest in downplaying it or over hyping it rather than impartially approaching it as a tool.

Gigachad|6 months ago

It’s also just not consistent. A manager who can’t code using it to generate a react todo list thinks it’s 100x efficiency while a senior software dev working on established apps finds it a net productivity negative.

AI coding tools seem to excel at demos and flop on the field so the expectation disconnect between managers and actual workers is massive.

epolanski|6 months ago

This is a very sensible point.