(no title)
thor-rodrigues | 4 months ago
The simplest explanation would be “You’re using it wrong…”, but I have the impression that this is not the primary reason. (Although, as an AI systems developer myself, you would be surprised by the number of users who simply write “fix this” or “generate the report” and then expect an LLM to correctly produce the complex thing they have in mind.)
It is true that there is an “upper management” hype of trying to push AI into everything as a magic solution for all problems. There is certainly an economic incentive from a business valuation or stock price perspective to do so, and I would say that the general, non-developer public is mostly convinced that AI is actually artificial intelligence, rather than a very sophisticated next-word predictor.
While claiming that an LLM cannot follow a simple instruction sounds, at best, very unlikely, it remains true that these models cannot reliably deliver complex work.
WA|4 months ago
Some developers will either retrospectively change the spec in their head or are basically fine with the slight deviation. Other developers will be disappointed, because the LLM didn't deliver on the spec they clearly hold in their head.
It's a bit like a psychological false memory effect where you misremember and/or some people are more flexibel in their expectations and accept "close enough" while others won't accept this.
At least, I noticed both behaviors in myself.
benashford|4 months ago
Both situations need an iterative process to fix and polish before the task is done.
The notable thing for me was, we crossed a line about six months ago where I'd need to spend less time polishing the LLM output than I used to have to spend working with junior developers. (Disclaimer: at my current place-of-work we don't have any junior developers, so I'm not comparing like-with-like on the same task, so may have some false memories there too.)
But I think this is why some developers have good experiences with LLM-based tools. They're not asking "can this replace me?" they're asking "can this replace those other people?"
scuff3d|4 months ago
Mitchell Hashimito just did a write up about his process for shipping a new feature for Ghostty using AI. He clearly knows what he's doing and follows all the AI "best practices" as far as I could tell. And while he very clearly enjoyed the process and thinks it made him more productive, the post is also a laundry list of this thing just shitting the bed. It gets confused, can't complete tasks, and architects the code in ways that don't make sense. He clearly had to watch it closely, step in regularly, and in some cases throw the code out entirely and write it himself.
The amount of work I've seen people describe to get "decent" results is absurd, and a lot of people just aren't going to do that. For my money it's far better as a research assistant and something to bounce ideas off of. Or if it is going to write something it needs to be highly structured input with highly structured output and a very narrow scope.
sussmannbaka|4 months ago
This is just what I observe on HN, I don't doubt there's actual devs (rather than the larping evangelist AI maxis) out there who actually get use out of these things but they are pretty much invisible. If you are enthusiastic about your AI use, please share how the sausage gets made!
fragmede|4 months ago
structural|4 months ago
A huge amount of effort goes into just searching for what relevant APIs are meant to be used without reinventing things that already exist in other parts of the codebase. I can send ten different instantiations of an agent off to go find me patterns already in use in code that should be applied to this spot but aren't yet. It can also search through a bug database quite well and look for the exact kinds of mistakes that the last ten years of people just like me made solving problems just like the one I'm currently working on. And it finds a lot.
Is this better than having the engineer who wrote the code and knows it very well? Hell no. But you don't always have that. And at the largest scale you really can't, because it's too large to fit in any one person's memory. So it certainly does devolve to searching and reading and summarizing for a lot of the time.
Zababa|4 months ago
ehnto|4 months ago
But the other thing is that, your expectations normalise, and you will hit its limits more often if you are relying on it more. You will inevitably be unimpressed by it, the longer you use it.
If I use it here and there, I am usually impressed. If I try to use it for my whole day, I am thoroughly unimpressed by the end, having had to re-do countless things it "should" have been capable of based on my own past experience with it.
mexicocitinluez|4 months ago
Absolutely nuts I had to scroll down this far to find the answer.Totally agree.
Maybe it's the fact that every software development job has different priorities, stakeholders, features, time constraints, programming models, languages, etc. Just a guess lol
lm28469|4 months ago
The simplest explanation is that most of us are code monkeys reinventing the same CRUD wheel over and over again, gluing things together until they kind of work and calling it a day.
"developers" is such a broad term that it basically is meaningless in this discussion
mexicocitinluez|4 months ago
lol.
another option is trying to convince yourself that you have any idea what the other 2,000,000 software devs are doing and think you can make grand, sweeping statements about it.
there is no stronger mark of a junior than the sentiment you're expressing
yoyohello13|4 months ago
The SVP of IT for my company is 100% in on AI. He talks about how great it is all the time. I just recently worked on a legacy project in PHP he build years ago, and now I know his bar for what quality code looks like is extremely low...
I use LLMs daily to help with my work, but I tweak the output all the time because it doesn't quite get it right.
Bottom line, if your code is below average AI code will look great.
nasmorn|4 months ago
That’s being not a complete idiot as a service.
If it was at least how do I start the decalcification process on this machine so it actually realizes it and turns the service light off.
tovej|4 months ago
That is not reliable, that's the opposite of reliable.
krisoft|4 months ago
KronisLV|4 months ago
Some possible reasons:
With all of that, my success rate is pretty great and the statement about the tech not being able to "...barely follow a simple instruction" holds untrue. Then again, most of my projects are webdev adjacent in mostly mainstream stacks, YMMV.surgical_fire|4 months ago
This is probably the most significant part of your answer. You are asking it to do things for which there are a ton of examples of in the training data. You described narrowing the scope of your requests too, which tends to be better.
impossiblefork|4 months ago
In the fixed world of mathematics, everything could in principle be great. In software, it can in principle be okay even though contexts might be longer. When dealing with new contexts in something like real life, but different-- such as a story where nobody can communicate with the main characters because they speak a different language, then the models simply can't deal with it, always returning to the context they're familiar with.
When you give them contexts that are different enough from the kind of texts they've seen, they do indeed fail to follow basic instructions, even though they can follow seemingly much more difficult instructions in other contexts.
lnrd|4 months ago
My hypothesis is that developers work on different things and while these models might work very well for some domains (react components?) they will fail quickly in others (embedded?). So one one side we have developers working on X (LLM good at it) claiming that it will revolutionize development forever and the other side we have developers working on Y (LLM bad at it) claiming that it's just a fad.
h33t-l4x0r|4 months ago
wiether|4 months ago
Based on my own personal experience:
- on some topics, I get the x100 productivity that is pushed by some devs; for instance this Saturday I was able to make two features that I was reschudeling for years because, for lack of knowledge, it would have taken me many days to make them, but a few back and forth with an LLM and everything was working as expected; amazing!
- on other topics, no matter how I expose the issue to an LLM, at best it tells me that it's not solvable, at worst they try to push an answer that doesn't make any sense and push an even worst one when I point it out...
And when people ask me what I think about LLM, I say : "that's nice and quite impressive, but still it can't be blindly trusted and needs a lot of overhead, so I suggest caution".
I guess it's the classic half empty or half full glass.
logicchains|4 months ago
Two of the key skills needed for effective use of LLMs are writing clear specifications (written communication), and management, skills that vary widely among developers.
skydhash|4 months ago
smt88|4 months ago
But mostly my experience is that people who regularly get good output from AI coding tools fall into these buckets:
A) Very limited scope (e.g. single, simple method with defined input/output in context)
B) Aren't experienced enough in the target domain to see the problems with the AI's output (let's call this "slop blindness")
C) Use AI to force multiple iterations of the same prompt to "shake out the bugs" automatically instead of using the dev's time
I don't see many cases outside of this.
bsder|4 months ago
Oh, boy, this. For example, I often use whatever AI I have to adjust my Nix files because the documentation for Nix is so horrible. Sure, it's slop, but it gets me working again and back to what I'm supposed to be doing instead of farting with Nix.
I would also argue:
D) The fact that an AI can do the task indicates that something with the task is broken.
If an AI can do the task well, there is something fundamentally wrong. Either the abstractions are broken, the documentation is horrible, the task is pure boilerplate, etc.
JamesSwift|4 months ago
Ianjit|4 months ago
My guess is that for some types of work people don't know what the complex thing they have in mind is ex ante. The idea forms and is clarified through the process of doing the work. For those types of task there is no efficiency gain in using AI to do the work.
danielbln|4 months ago
netdevphoenix|4 months ago
The question assumes that all developers do the same work. The kind of work done by an embedded dev is very different from the work of a front-end dev which is very different from the kind of work a dev at Jane Street does. And even then, devs work on different types of projects: greenfield, brownfield and legacy. Different kind of setups: monorepo, multiple repos. Language diversity: single language, multiple languages, etc.
Devs are not some kind of monolith army working like robots in a factory.
We need to look at these factors before we even consider any sort of ML.
izacus|4 months ago
gf000|4 months ago
> [..] possibly the repo is too far off the data distribution.
(Karpathy's quote)
Zababa|4 months ago
It is getting easier and easier to get good results out of them, partially by the models themselves improving, partially by the scaffolding.
> non-developer public is mostly convinced that AI is actually artificial intelligence, rather than a very sophisticated next-word predictor
This is a false dichotomy that assumes we know way more about intelligence than we actually do, and also assumes than what you need to ship lots of high quality software is "intelligence".
>While claiming that an LLM cannot follow a simple instruction sounds, at best, very unlikely, it remains true that these models cannot reliably deliver complex work.
"reliably" is doing a lot of work here. If it means "without human guidance" it is true (for now), if it means "without scaffolding" it is true (also for now), if it means "at all" it is not true, if it means it can't increase dev productivity so that they ship more at the same level of quality, assuming a learning period, it is not true.
I think those conversations would benefit a lot from being more precise and more focused, but I also realize that it's hard to do so because people have vastly different needs, levels of experience, expectations ; there are lots of tools, some similar, some completely different, etc.
To answer your question directly, ie “Why do LLM experiences vary so much among developers?”: because "developer" is a very very very wide category already (MISRA C on a car, web frontend, infra automation, medical software, industry automation are all "developers"), with lots of different domains (both "business domains" as in finance, marketing, education and technical domains like networking, web, mobile, databases, etc), filled with people with very different life paths, very different ways of working, very different knowledge of AIs, very different requirements (some employers forbid everything except a few tools), very different tools that have to be used differently.
nunez|4 months ago
For some (me), it's amazing because I use the technology often despite its inaccuracies. Put another way, it's valuable enough to mitigate its flaws.
For many others, it's on a spectrum between "use it sometimes but disengage any time it does something I wouldn't do" and "never use it" depending on how much control they want over their car.
In my case, I'm totally fine handing driving off to AI (more like ML + computer vision) most times but am not okay handing off my brain to AI (LLMs) because it makes too many mistakes and the work I'd need to do to spot-check them is about the same as I'd need to put in to do the thing myself.
theshrike79|4 months ago
I know Car People who refuse to use even lane keeping assist, because it doesn't fit their driving style EXACTLY and it grates them immensely.
I on the other hand DGAF, I love how I don't need to mess with micro adjustments of the steering wheel on long stretches of the road, the car does that for me. I can spend my brainpower checking if that Gray VW is going to merge without an indicator or not.
Same with LLM, some people have a very specific style of code they want to produce and anything that's not exactly their style is "wrong" and "useless". Even if it does exactly what it should.
XenophileJKO|4 months ago
danielbln|4 months ago
ninetyninenine|4 months ago
Take Joe. Joe sticks with AI and uses it to build an entire project. Hundreds of prompts. Versus your average HNer who thinks he’s the greatest programmer in the company and thinks he doesn’t need AI but tries it anyway. Then AI fails and fulfills his confirmation bias and he never tries it again.