The current state of LLM-driven development

[+] tptacek|7 months ago|reply

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

It's a really weird way to open up an article concluding that LLMs make one a worse programmer: "I definitely know how to use this tool optimally, and I conclude the tool sucks". Ok then. Also: the piano is a terrible, awful instrument; what a racket it makes.

[+] credit_guy|7 months ago|reply

Fully agree. It takes months to learn how to use LLMs properly. There is an initial honeymoon where the LLMs blow your mind out. Then you get some disappointments. But then you start realizing that there are some things that LLMs are good at and some that they are bad at. You start creating a feel for what you can expect them to do. And more importantly, you get into the habit of splitting problems into smaller problems that the LLMs are more likely to solve. You keep learning how to best describe the problem, and you keep adjusting your prompts. It takes time.

[+] SkyPuncher|7 months ago|reply

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.

Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

[+] troupo|7 months ago|reply

> I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

Because for all our posturing about being skeptical and data driven we all believe in magic.

Those "counterintuitive non-trivial workflows"? They work about as well as just prompting "implement X" with no rules, agents.md, careful lists etc.

Because 1) literally no one actually measures whether magical incarnations work and 2) it's impossible to make such measurements due to non-determinism

[+] prerok|7 months ago|reply

I agree with your assessment about this statement. I actually had to reread it a few times to actually understand it.

He is actually recommending Copilot for price/performance reasons and his closing statement is "Don’t fall for the hype, but also, they are genuinely powerful tools sometimes."

So, it just seems like he never really gave a try at how to engineer better prompts that these more advanced models can use.

[+] rocqua|7 months ago|reply

The OPs point seems to be: it's very quick for LLMs to be a net benefit to your skills, if it is a benefit at all. That is, he's only speaking of the very beginning part of the learning curve.

[+] edfletcher_t137|7 months ago|reply

The first two points directly contradict each other, too. Learning a tool should have the outcome that one is productive with it. If getting to "productive" is non-trivial, then learning the tool is non-trivial.

[+] enraged_camel|7 months ago|reply

Agreed. This is an astonishingly bad article. It's clear that the only reason it made it to the front page is because people who view AI with disdain or hatred upvoted it. Because as you say: how can anyone make authoritative claims about a set of tools not just without taking the time to learn to use them properly, but also believing that they don't even need to bother?

[+] hislaziness|7 months ago|reply

Would it be more appropriate to compare LLMs to Autotunes rather than pianos?

[+] lordnacho|7 months ago|reply

I've said it before, I feel like I'm some sort of lottery winner when it comes to LLM usage.

I've tried a few things that have mostly been positive. Starting with copilot in-line "predictive text on steroids" which works really well. It's definitely faster and more accurate than me typing on a traditional intellisense IDE. For me, this level of AI is cant-lose: it's very easy to see if a few lines of prediction is what you want.

I then did Cursor for a while, and that did what I wanted as well. Multi-file edits can be a real pain. Sometimes, it does some really odd things, but most of the time, I know what I want, I just don't want to find the files, make the edits on all of them, see if it compiles, and so on. It's a loop that you have to do as a junior dev, or you'll never understand how to code. But now I don't feel I learn anything from it, I just want the tool to magically transform the code for me, and it does that.

Now I'm on Claude. Somehow, I get a lot fewer excursions from what I wanted. I can do much more complex code edits, and I barely have to type anything. I sort of tell it what I would tell a junior dev. "Hey let's make a bunch of connections and just use whichever one receives the message first, discarding any subsequent copies". If I was talking to a real junior, I might answer a few questions during the day, but he would do this task with a fair bit of mess. It's a fiddly task, and there are assumptions to make about what the task actually is.

Somehow, Claude makes the right assumptions. Yes, indeed I do want a test that can output how often each of the incoming connections "wins". Correct, we need to send the subscriptions down all the connections. The kinds of assumptions a junior would understand and come up with himself.

I spend a lot of time with the LLM critiquing, rather than editing. "This thing could be abstracted, couldn't it?" and then it looks through the code and says "yeah I could generalize this like so..." and it means instead of spending my attention on finding things in files, I look at overall structure. This also means I don't need my highest level of attention, so I can do this sort of thing when I'm not even really able to concentrate, eg late at night or while I'm out with the kids somewhere.

So yeah, I might also say there's very little learning curve. It's not like I opened a manual or tutorial before using Claude. I just started talking to it in natural language about what it should do, and it's doing what I want. Unlike seemingly everyone else.

[+] bgwalter|7 months ago|reply

Pianists' results are well known to be proportional to their talent/effort. In open source hardly anyone is even using LLMs and the ones that do have barely any output, In many cases less output than they had before using LLMs.

The blogging output on the other hand ...

[+] stillpointlab|7 months ago|reply

I agree with you and I have seen this take a few times now in articles on HN, which amounts to the classic: "We've tried nothing and we're all out of ideas" Simpson's joke.

I read these articles and I feel like I am taking crazy pills sometimes. The person, enticed by the hype, makes a transparently half-hearted effort for just long enough to confirm their blatantly obvious bias. They then act like the now have ultimate authority on the subject to proclaim their pre-conceived notions were definitely true beyond any doubt.

Not all problems yield well to LLM coding agents. Not all people will be able or willing to use them effectively.

But I guess "I gave it a try and it is not for me" is a much less interesting article compared to "I gave it a try and I have proved it is as terrible as you fear".

[+] throwawaybob420|7 months ago|reply

Judging from all the comments here, it’s going to be amazing seeing the fallout of all the LLM generated code in a year or so. The amount of people who seemingly relish the ability to stop thinking and let the model generate giant chunks of their code base, is uh, something else lol.

[+] thefourthchime|7 months ago|reply

It entirely depends on the exposure and reliability the code needs. Some code is just a one-off to show a customer what something might look like. I don't care at all how well the code works or what it looks like for something like that. Rapid prototyping is a valid use case for that.

I have also written a C++ code that has to have a runtime of years, meaning there can be absolutely no memory leaks or bugs whatsoever, or TV stops working. I wouldn't have a language model write any of that, at least not without testing the hell out of it and making sure it makes sense to myself.

It's not all or nothing here. These things are tools and should be used as such.

[+] memorylane|7 months ago|reply

Dunno about you, but I find thinking hard… when I offload boilerplate code to Claude, I have more cycles left over to hold the problem in my head and effectively direct the agent in detail.

[+] candiddevmike|7 months ago|reply

Software "engineering" at it's finest

[+] dogcomplex|7 months ago|reply

lol yep we've never had codebases hacked together by juniors before running major companies in production - nope, never

[+] varispeed|7 months ago|reply

I think you are over estimating the quality of code humans generate. I take LLM over any output of junior - to mid level developer (if they were given the same prompt / ask)

[+] ebiester|7 months ago|reply

I disagree from almost the first sentence:

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

Learning how to use LLMs in a coding workflow is trivial to start, but you find you get a bad taste early if you don't learn how to adapt both your workflow and its workflow. It is easy to get a trivially good result and then be disappointed in the followup. It is easy to try to start on something it's not good at and think it's worthless.

The pure dismissal of cursor, for example, means that the author didn't learn how to work with it. Now, it's certainly limited and some people just prefer Claude code. I'm not saying that's unfair. However, it requires a process adaptation.

[+] donperignon|7 months ago|reply

LLM’s are basically glorified slot machines. Some people try very hard to come up with techniques or theories about when the slot machine is hot, it’s only an illusion, let me tell you, it’s random and arbitrary, maybe today is your lucky day maybe not. Same with AI, learning the “skill” is as difficult as learning how to google or how to check stackoverflow, trivial. All the rest is luck and how many coins do you have in your pocket.

[+] mikeshi42|7 months ago|reply

There's plenty of evidence that good prompts (prompt engineering, tuning) can result in better outputs.

Improving LLM output through better inputs is neither an illusion, nor as easy as learning how to google (entire companies are being built around improving llm outputs and measuring that improvement)

[+] gloomyday|7 months ago|reply

This is not a good analogy. The parameters of slot machines can be changed to make the casino lose money. Just because something is random, doesn't mean it is useless. If you get 7 good outputs out of 10 from an LLM, you can still use it for your benefit. The frequency of good outputs and how much babysitting it requires determine whether it is worth using or not. Humans make mistakes too, although way less often.

[+] simonw|7 months ago|reply

Learning how to Google is not trivial.

[+] jstummbillig|7 months ago|reply

We know what random* looks like: a coin toss, the roll of a die. Token generation is neither.

[+] unknown|7 months ago|reply

[deleted]

[+] simonw|7 months ago|reply

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. [...]

LLMs will always suck at writing code that has not be written millions of times before. As soon as you venture slightly offroad, they falter.

That right there is your learning curve! Getting LLMs to write code that's not heavily represented in their training data takes experience and skill and isn't obvious to learn.

[+] kodisha|7 months ago|reply

LLM driven coding can yield awesome results, but you will be typing a lot and, as article states, requires already well structured codebase.

I recently started with fresh project, and until I got to the desired structure I only used AI to ask questions or suggestions. I organized and written most of the code.

Once it started to get into the shape that felt semi-permanent to me, I started a lot of queries like:

```

- Look at existing service X at folder services/x

- see how I deploy the service using k8s/services/x

- see how the docker file for service X looks like at services/x/Dockerfile

- now, I started service Y that does [this and that]

- create all that is needed for service Y to be skaffolded and deployed, follow the same pattern as service X

```

And it would go, read existing stuff for X, then generate all of the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y

With zero to none mistakes. Both claude and gemini are more than capable to do such task. I had both of them generate 10-15 files with no errors, with code being able to be deployed right after (of course service will just answer and not do much more than that)

Then, I will take over again for a bit, do some business logic specific to Y, then again leverage AI to fill in missing bits, review, suggest stuff etc.

It might look slow, but it actually cuts most boring and most error prone steps when developing medium to large k8s backed project.

[+] randfish|7 months ago|reply

Deeply curious to know if this is an outlier opinion, a mainstream but pessimistic one, or the general consensus. My LinkedIn feed and personal network certainly suggests that it's an outlier, but I wonder if the people around me are overly optimistic or out of synch with what the HN community is experiencing more broadly.

[+] Palmik|7 months ago|reply

People that comment on and get defensive about this bit:

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

How much of your workflow or intuition from 6 months ago is still relevant today? How long would it take to learn the relevant bits today?

Keep in mind that Claude Code was released less than 6 months ago.

[+] pyb|7 months ago|reply

A fraction of the LLM maximalists are being defensive, because they don't want to consider that they've maybe invested too much time in those tools ; considering what said tools are currently genuinely good at.

[+] simonw|7 months ago|reply

Pretty much all of the intuition I've picked up about getting good results from LLMs has stayed relevant.

If I was starting from fresh today I expect it would take me months of experimentation to get back to where I am now.

Working thoughtfully with LLMs has also helped me avoid a lot of the junk tips ("Always start with 'you are the greatest world expert in X', offer to tip it, ...") that are floating around out there.

[+] AndyNemmity|7 months ago|reply

Hell, my workflow isn't the same two weeks ago when subagents were released.

[+] jamboca|7 months ago|reply

Have built many pipelines integrating LLMs to drive real $ results. I think this article boils it down too simply. But i always remember, if the LLM is the most interesting part of your work, something is severely wrong and you probably aren’t adding much value. Context management based on some aspects of your input is where LLMs get good, but you need to do lots of experimentation to tune something. Most cases i have seen are about developing one pipeline to fit 100s of extremely different cases; LLM does not solve this problem but basically serves as an approximator for you to discretize previously large problems in to some information sub space where you can treat the infinite set of inputs as something you know. LLMs are like a lasso (and a better/worse one than traditional lassos depending on use case) but once you get your catch you still need to process it, deal with it progammatically to solve some greater problem. I hate how so many LLM related articles/comments say “ai is useless throw it away dont use it” or “ai is the future if we dont do it now we’re doomed lets integrate it everywhere it can solve all our problems” like can anyone pick a happy medium? Maybe thats what being in a bubble looks like

[+] spenrose|7 months ago|reply

So many articles should prepend “My experience with ...” to their title. Here is OP's first sentence: “I spent the past ~4 weeks trying out all the new and fancy AI tools for software development.” Dude, you have had some experiences and they are worth writing up and sharing. But your experiences are not a stand-in for "the current state." This point applies to a significant fraction of HN articles, to the point that I wish the headlines were flagged “blog”.

[+] mettamage|7 months ago|reply

Clickbait gets more reach. It's an unfortunate thing. I remember Veritasium in a video even saying something along the lines of him feeling forced to do clickbaity YouTube because it works so well.

The reach is big enough to not care about our feelings. I wish it wasn't this way.

[+] hiAndrewQuinn|7 months ago|reply

>I made a CLI logs viewers and querier for my job, which is very useful but would have taken me a few days to write (~3k LoC)

I recall The Mythical Man-Month stating a rough calculation that the average software developer writes about 10 net lines of new, production-ready code per day. For a tool like this going up an order of magnitude to about 100 lines of pretty good internal tooling seems reasonable.

OP sounds a few cuts above the 'average' software developer in terms of skill level. But here we also need to point out a CLI log viewer and querier is not the kind of thing you actually needed to be a top tier developer to crank out even in the pre-LLM era, unless you were going for lnav [1] levels of polish.

[1]: https://lnav.org/

[+] nvbalaji|7 months ago|reply

>>You can safely ignore them if they don’t fit your workflows at the moment

I would rather qualify this statement a bit more - I would say "you can safely ignore if you are not building anything green field or build tools for self". In my experiments in the last one month or so, it is very efficient for building new components (small & medium). Making it efficient for the existing code base is a bit more tricky - you need to make sure it adheres to the way things are coded already, not to leak .env contents to LLMs, building a context from the existing components so that it does not read code every time (leading to cost and time escalations) and so on.

My main issue so far has been understanding the code that is generated. As of now that is the biggest bottleneck in increasing the productivity - i.e it takes a long time to review the code and push. In usual workflow of building, normally by the time the code complexity has increased in the system I would have sufficient mental construction to handle that complexity. I would know the inner workings of code. However if AI generates large piece of code getting into that code is taking a long time

[+] dezmou|7 months ago|reply

OP did miss the vscode extension for claude code, it is still terminal based but: - it show you the diff of the incoming changes in vscode ( like git ) - it know the line you selected in the editor for context

[+] mark_l_watson|7 months ago|reply

Interesting read, but strange to totally ignore the macOS ChatGPT app that optionally integrates with a terminal session, the currently opened VSCode editor tab, XCode. etc. I use this combination at least 2 or 3 times a month, and even if my monthly use is less that 40 minutes total, it is a really good tool to have in your toolbelt.

The other thing I disagree with is the coverage of gemnini-cli: if you use gemini-cli for a single long work session, then you must set your Google API key as an environment variable when starting gemini-cli, otherwise you end up after a short while using Gemini-2.5-flash, and that leads to unhappy results. So, use gemini-cli for free for short and focused 3 or 4 minute work sessions and you are good, or pay for longer work sessions, and you are good.

I do have a random off topic comment: I just don’t get it: why do people live all day in an LLM-infused coding environment? LLM based tooling is great, but I view it as something I reach for a few times a day for coding and that feels just right. Separately, for non-coding tasks, reaching for LLM chat environments for research and brainstorming is helpful, but who really needs to do that more than once or twice a day?

[+] itsalotoffun|7 months ago|reply

I think we're still in the gray zone of the "Incessant Obsolescence Postulate" (the Wait Calculation). Are you better off "skilling up" on the tech as it is today, or waiting for it to just "get better" so by the time you kick off, you benefit from the solved-problems X years from now. I also think this calculation differs by domain, skill level, and your "soft skill" abilities to communicate, explain and teach. In some domains, if you're not already on this train, you won't even get hired anymore.

The current state of LLM-driven development is already several steps down the path of an end-game where the overwhelming majority of code is written by the machine; our entire HCI for "building" is going to be so far different to how we do it now that we'll look back at the "hand-rolling code era" in a similar way to how we view programming by punch-cards today. The failure modes, the "but it SUCKS for my domain", the "it's a slot machine" etc etc are not-even-wrong. They're intermediate states except where they're not.

The exceptions to this end-game will be legion and exist only to prove the end-game rule.

[+] fnordsensei|7 months ago|reply

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

Do they? I’ve found Clojure-MCP[1] to be very useful. OTOH, I’m not attempting to replace myself, only augment myself.

1: https://github.com/bhauman/clojure-mcp

[+] mark_l_watson|7 months ago|reply

Thanks for the link! I used to use Clojure a lot professionally, but now just for fun projects, and to occasionally update my old Clojure book. I had bookmarked Clojure-MCP a while ago, but never got back to it but I will give it a try.

I like your phrasing of “OTOH, I’m not attempting to replace myself, only augment myself.” because that is my personal philosophy also.

[+] eric-burel|7 months ago|reply

Good read. I just want to pinpoint that LLMs seems to write better React code, but as an experienced frontend developers my opinion is that it's also bad at React. Its approach is outdated as it doesn't follow the latest guidelines. It writes React as I would have written it in 2020. So as usual, you need to feed the right context to get proper results.

[+] OldfieldFund|7 months ago|reply

I don't agree. Cursor is mind-blowingly good with the new agentic updates.

[+] stephc_int13|7 months ago|reply

I have not tried every IDE/CLI or models, only a few, mostly Claude and Qwen.

I work mostly in C/C++.

The most valuable improvement of using this kind of tools, for me, is to easily find help when I have to work on boring/tedious tasks or when I want to have a Socratic conversation about a design idea with a not-so-smart but extremely knowledgeable colleague.

But for anything requiring a brain, it is almost useless.

[+] softwaredoug|7 months ago|reply

I find all AI coding goes something like this algorithm

* I let the AI do something

* I find bad bug or horrifying code

* I realize I have it too much slack

* hand code for a while

* go back to narrow prompts

* get lazy, review code a bit less add more complexity

* GOTO 1, hopefully with a better instinct for where/how to trust this model

Then over time you hone your instinct on what to delegate and what to handle yourself. And how deeply to pay attention.

[+] d_silin|7 months ago|reply

Relying on LLM for any skill, especially programming, is like cutting your own healthy legs and buying crutches to walk. Plus you now have to pay $49/month for basic walking ability and $99/month for "Walk+" plan, where you can also (clumsily) jog.

[+] aeonik|7 months ago|reply

It's more like strapping on a exoskeleton suit with a jetpack.

It makes your existing strength and mobility greater, but don't be surprised if you fly into space that you will suffocate,

or if you fly over an ocean and run out gas, that you'll sink to the bottom,

or if you fly the suit in your fine glassware shop with patrons in the store, that your going to break and burn everything/everyone in there.

[+] derektank|7 months ago|reply

There are a lot of skills which I haven't developed because I rely on external machines to handle it for me; memorization, fire-starting, navigation. On net, my life is better for it. LLMs may or may not be as effective at replacing code development as books have been at replacing memorization and GPS has been at replacing navigation, but eventually some tool will be and I don't think I'll be worse off for developing other skills.

[+] candiddevmike|7 months ago|reply

Why would I pay you to walk with crutches when I can just get crutches and walk myself?

233 comments