I vibe coded a retro emulator and assembler with tests. Prompts were minimal and I got really great results (Gemini 3). I tried vibe coding the tricky proprietary part of an app I worked on a few years ago; highly technical domain (yes vague don’t care to dox myself). Lots of prompting and didn’t get close.
There are literally thousands of retro emulators on github. What I was trying to do had zero examples on GitHub. My take away is obvious as of now. Some stuff is easy some not at all.
I call these "embarrassingly solved problems". There are plenty of examples of emulators on GitHub, therefore emulators exist in the latent spaces of LLMs. You can have them spit one out whenever you want. It's embarrassingly solved.
I tried to vibe code a technical not so popular niche and failed. Then I broke down the problem as much as I could and presented the problem in clearer terms and Gemini provided working code in just a few attempts. I know this is an anecdote, but try to break down the problem you have in simpler terms and it may work. Niche industry specific frameworks are a little difficult to work with in vibe code mode. But if you put in a little effort, AI seems to be faster than writing code all on your own.
I dunno I get it to do stuff every day that’s never been done before, if you prompt really well, give loads of context, and take it slowly it’s amazing at it and still saves me a ton of time.
I always suspect the devil is in the details with these posts. The difference between smart prompting strategies and the way I see most people prompt ai is vast.
I think AI is just a massive force multiplier. If your codebase has bad foundation and going in the wrong direction with lots of hacks, it will just write code which mirrors the existing style... And you get exactly was OP is suggesting.
If however, your code foundations are good and highly consistent and never allow hacks, then the AI will maintain that clean style and it becomes shockingly good; in this case, the prompting barely even matters. The code foundation is everything.
But I understand why a lot of people are still having a poor experience. Most codebases are bad. They work (within very rigid constraints, in very specific environments) but they're unmaintainable and very difficult to extend; require hacks on top of hacks. Each new feature essentially requires a minor or major refactoring; requiring more and more scattered code changes as everything is interdependent (tight coupling, low cohesion). Productivity just grinds to a slow crawl and you need 100 engineers to do what previously could have been done with just 1. This is not a new effect. It's just much more obvious now with AI.
I've been saying this for years but I think too few engineers had actually built complex projects on their own to understand this effect. There's a parallel with building architecture; you are constrained by the foundation of the building. If you designed the foundation for a regular single storey house, you can't change your mind half-way through the construction process to build a 20-storey skyscraper. That said, if your foundation is good enough to support a 100 storey skyscraper, then you can build almost anything you want on top.
My perspective is if you want to empower people to vibe code, you need to give them really strong foundations to work on top of. There will still be limitations but they'll be able to go much further.
My experience is; the more planning and intelligence goes into the foundation, the less intelligence and planning is required for the actual construction.
The wrinkle is that the AI doesn't have a truly global view, and so it slowly degrades even good structure, especially if run without human feedback and review. But you're right that good structure really helps.
I just did my first “AI native coding project”. Both because for now I haven’t run into any quotas using Codex CLI with my $20/month ChatGPT subscription and the company just gave everyone an $800/month Claude allowance.
Before I even started the implementation I:
1. Put the initial sales contract with the business requirements.
2. Notes I got from talking to sales
3. The transcript of the initial discovery calls
4. My design diagrams that were well labeled (cloud architecture and what each lambda does)
5. The transcript of the design review and my explanations and answering questions.
6. My ChatGPT assisted breakdown of the Epics/stories and tasks I had to do for the PMO
I then told ChatGPT to give a detailed breakdown of everything during the session as Markdown
That was the start of my AGENTS.md file.
While working through everything task by task and having Codex/Claude code do the coding, I told it to update a separate md file with what it did and when I told it to do something differently and why.
Any developer coming in after me will have complete context of the project from the first git init and they and the agents will know the why behind every decision that was made.
Can you say that about any project that was done before GenAI?
This is what I’ve discovered as well. I’ve been working on refactoring a massive hunk of really poor quality contractor code, and Codex originally made poor and very local fixes/changes.
After rearchitecting the foundations (dumping bootstrap, building easy-to-use form fields, fixing hardcoded role references 1,2,3…, consolidating typescript types, etc.) it makes much better choices without needing specific guidance.
Codex/Claude Code won’t solve all your problems though. You really need to take some time to understand the codebase and fixing the core abstractions before you set it loose. Otherwise, it just stacks garbage on garbage and gets stuck patching and won’t actually fix the core issues unless instructed.
A tangent, I keep hearing this good base, but I've never seen one, not in the real world.
No projects, unless it's only you working on it, only yourself as the client, and is so rigid in it's scope, it's frankly useless, will have this mythical base. Over time the needs change, there's no sticking to the plan. Often it's a change that requires rethinking a major part. What we loathe as tight coupling was just efficient code with the original requirements.
Then it becomes a time/opportunity cost vs quality loss comparison. Time and opportunity always wins. Why?
Because we live in a world run by humans, who are messy and never sticks to the plan. Our real world systems (bureaucracy , government process, the list goes on) are never fully automated and always leaves gaps for humans to intervene. There's always a special case, an exception.
Perfectly architected code vs code that does the thing have no real world difference. Long term maintainability? Your code doesn't run in a vaccum, it depends on other things, it's output is depended on by other things. Change is real, entropy is real. Even you yourself, you perfect programmer who writes perfect code will succumb eventually and think back on all this with regret. Because you yourself had to choose between time/opportunity vs your ideals and you chose wrong.
This does not track with my experience, trying agents out in a ~100K LOC codebase written exclusively by me. I can't tell you whether nor not it has a good foundation by your standards, but I find the outputs to be tasteless, and there should be more than enough context for what the style of the code is.
Given how adamant some people I respect a lot are about how good these models are, I was frankly shocked to see SOA models do transformations like
BEFORE:
// 20 lines
AFTER
if (something)
// the 20 lines
else
// the same 20 lines, one boolean changed in the middle
When I point this out, it extracts said 20 lines into a function that takes in the entire context used in the block as arguments:
AFTER 2:
if (something)
function_that_will_never_be_used_anywhere_else(a, b, c, &d, &e, &f, true);
else
function_that_will_never_be_used_anywhere_else(a, b, c, &d, &e, &f, false);
It also tends to add these comments that don't document anything, but rather just describe the latest change it did to the code:
// Extracted repeating code into a function:
void function_that_will_never_be_used_anywhere_else(...) {
...
}
and to top it off it has the audacity to tell me "The code is much cleaner now. Happy building! (rocketship emoji)"
Can the AI help with refactoring a poor codebase? Can it at least provide good suggestions for improvement if asked to broadly survey a design that happens to be substandard? Most codebases are quite bad as you say, so this is a rather critical area.
When you say multiplier, what kind of number are you talking about. Like what multiple of features shipped that don't require immediate fixes have you experienced.
my exact experience, and AI is especially fragile when you are starting new project from scratch.
Right know I'm building NNTP client for macOS (with AppKit), because why not, and initially I had to very carefully plan and prompt what AI has to do, otherwise it would go insane (integration tests are must).
Right know I have read-only mode ready and its very easy to build stuff on top of it.
socketcluster nailed it. I've seen this firsthand — the same agent produces clean output when the codebase has typed specs and a manifest, and produces garbage when it's navigating tribal knowledge. The hard part was always there. Agents just can't hide it like humans can.
Also re: "I spent longer arguing with the agent and recovering the file than I would have spent writing the test myself."
In my humble experience arguing with an LLM is a waste of time, and no-one should be spending time recovering files. Just do small changes one at a time, commit when you get something working, and discard your changes and try again if it doesn't.
I don't think AI is a panacea, it's just knowing when it's the right tool for the job and when it isn't.
Anyone not using version control or a IDE that will keep previous versions for a easy jump back is just being silly. If you're going to play with a kid who has a gun, wear your plates.
I don’t think it’s “just” that easy. AI can be great at generating unit tests but it can and will also frequently silently hack said tests to make them pass rather than using them as good indicators of what the program is supposed to be doing.
> Reading and understanding other people's code is much harder than writing code.
I keep seeing this sentiment repeated in discussions around LLM coding, and I'm baffled by it.
For the kind of function that takes me a morning to research and write, it takes me probably 10 or 15 minutes to read and review. It's obviously easier to verify something is correct than come up with the correct thing in the first place.
And obviously, if it took longer to read code than to write it, teams would be spending the majority of their time in code review, but they don't.
Five hours ago I was reviewing some failed tests in a PR. The affected code was probably 300 lines, total source for the project ~1200 lines. Reading the code, I couldn't figure out what the hell was going on... and I wrote all the code. Why would that be failing? This all looks totally fine. <changes some lines> There that should fix it! <runs test suite; 6 new broken tests> Fuck.
When you write code, your brain follows a logical series of steps to produce the code, based on a context you pre-loaded in your brain in order to be capable of writing it that way. The reader does not have that context pre-loaded in their brain; they have to reverse-engineer the context in order to understand the code, and that can be time-consuming, laborious, and (as in my case) erroneous.
I like to think of it as the distinction between editor and reader. Like you said, it's quite easy to read code. I heavily agree with this. I don't professionally write C but I can read and kinda infer what C devs are doing.
But if I were an "editor," I actually take the time to understand codepaths, tweak the code to see what could be better, actually try different refactoring approaches while editing. Literally seeing how this can be rewritten or reworked to be better, that takes considerable effort but it's not the same as reading.
We need a better word for this than editor and reading, like something with a dev classification too it.
Reading and thinking you understand other people's code is trivially easy
Reading and actually understanding other peoples' code is an unsolved problem
You draw an analogy from the function you wrote to a similar one. Maybe by someone who shared a social role similar to one you had in the past.
It just so happens that most times you think you understand something you aren't bit. Because bugs still exist we know that reading and understanding code can't be easier than writing. Also, in the past it would have take you less than a morning since the compiler was nicer. Anyway it sounds like most of your "writing" process was spent reading and understanding code.
>It's obviously easier to verify something is correct than come up with the correct thing in the first place.
You are missing the biggest root cause of the problem you describe: People write code differently!
There are "cough" developers whose code is copy/paste from all over the internet. I am not even getting into the AI folks going full copy/paste mode.
When investigating said code, you will be like why this code in here??
You call tell when a python script contains different logic for example.
Sure, 50 lines will be easy to ready, expand that to 100 lines and you be left on life support.
I think this originated from old arguments that say that the total _cumulative_ time spent reading code will be higher than the time spent writing it. But then people just warped it in their heads that it takes more time to read and understand code than it takes to write it in general, which is obviously false.
I think people want to believe this because it is a lot of effort to read and truly understand some pieces of code. They would just rather write the code themselves, so this is convenient to believe.
The reason I don't spend the majority of my time in code review is that when I'm reviewing my teammates' code I trust that the code has already been substantially verified already by that teammate in the process of writing it and testing it. Like 90% verified already. I see code review as just one small stage in the verification process, not the whole of it.
The way I approach it, it's really more about checking for failures, rather than verifying success. Like a smoke test. I scan over the code and if anything stands out to me as wrong, I point it out. I don't expect to catch everything that's wrong, and indeed I don't (as demonstrated by the fact that other members of the team will review the code and find issues I didn't notice). When the code has failed review, that means there's definitely an issue, but when the code has passed review, my confidence that there are no issues is still basically the same as it was before, only a little bit higher. Maybe I'm doing it wrong, I don't know.
If I had to fully verify that the code was correct when reviewing, applying the same level of scrutiny that I apply to my own code when I'm writing, I feel like I'd spend much longer on it---a similar time to what I'd spend writing on it.
Now with LLM coding, I guess opinions will differ as to how far one needs to fully verify LLM-generated code. If you see LLMs as stochastic parrots without any "real" intelligence, you'll probably have no trust in them and you'll see the code generated by the LLM as being 0% verified, and so as the user of the LLM you then have to do a "review" which is really going from 0% to 100%, not 90% to 100% and so is a much more challenging task. On the other hand, if you see LLMs as genuine intelligences you'd expect that LLMs are verifying the code to some extent as they write it, since after all it's pretty dumb to write a bunch of code for somebody without checking that it works. So in that case, you might see the LLM-generated code as 90% verified already, just as if it was generated by a trusted teammate, and then you can just do your normal review process.
The "marathon of sprints" paradigm is now everywhere and AI is turning it to 120%. I am not sure how many devs can keep sprinting all the time without any rest. AI maybe can help but it tends to go off-rails quickly when not supervised and reading code one did not author is more exhausting than just fixing one's own code.
I don't think it makes any part harder. What it does do is expose what people have ignored their whole career: the hard part. The last 15 years of software development has been 'human vibe coding'; copy+pasting snippets from SO without understanding them, no planning, constant rearchitecting, shipping code to prod as long as it runs on your laptop. Now that the AI is doing it, suddenly people want to plan their work and enforce tests? Seems like a win-win to me. Even if it slows down development, that would be a win, because the result is enforcement of better quality.
Well said. Much like the self driving debate we don’t need them to be perfect, just better than us to be useful, and clearly they already are for the most part.
> On a personal project, I asked an AI agent to add a test to a specific file. The file was 500 lines before the request and 100 lines after. I asked why it deleted all the other content. It said it didn't. Then it said the file didn't exist before. I showed it the git history and it apologised, said it should have checked whether the file existed first.
Ha! Yesterday an agent deleted the plan file after I told it to "forget about it" (as in, leave it alone).
These types of failures are par for the course, until the tools get better. I accept having to undo the odd unruly edit as part of the cost of getting the value.
This article has some serious usage of either bad prompting, or terrible models, or they’re referencing the past with their stories. I have experience AI’s deleting things they shouldn’t but not since like, the gpt4 days.
But that put aside, I don’t agree with the premise. It doesn’t make the hard parts harder, if you ACTUALLY spend half the time you’d have ORIGINALLY spent on the hard problem carefully building context and using smart prompting strategies. If you try and vibe code a hard problem in a one shot, you’re either gonna have a bad time straight away or you’re gonna have a bad time after you try and do subsequent prompting on the first codebase it spits out.
People are terrible observers of time. If you would’ve taken a week to build something, they try with AI for 2 hours and end up with a mess and claim either it’s not saving them any time or it’s making them code so bad it loses them time in the long run.
If instead they spent 8 hours slowly prompting bit by bit with loads of very specific requirements, technical specifications on exactly the code architecture it should follow with examples, build very slowly feature by feature, make it write tests and carefully add your own tests, observe it from the ground up and build a SOLID foundation, and spend day 2 slowly refining details and building features ONE BY ONE, you’d have the whole thing done in 2 days, and it’d be excellent quality.
But barely anyone does it this way. They vibe code it and complain that after 3 non specific prompts the ai wasn’t magically perfect.
After all these years of engineers complaining that their product manager or their boss is an idiot because they gave vague instructions and demanded it wasn’t perfect when they didn’t provide enough info, you’d think they’d be better at it given the chance. But no, in my experience coaching prompting, engineers are TERRIBLE at this. Even simple questions like “if I sent this prompt to you as an engineer, would you be able to do it based on the info here?” are things they don’t ask themselves.
Next time you use ai, imagine being the ai. Imagine trying to deliver the work based on the info you’ve been given. Imagine a boss that stamped their foot if it wasn’t perfect first try. Then, stop writing bad prompts.
Hard problems are easier with ai, if you treat hard problems with the respect they deserve. Almost no one does.
> …I have experience AI’s deleting
> things they shouldn’t but not since
> like, the gpt4 days.…
One blogger posted this [1] only yesterday about what Anthropic's latest and greatest did…
———
…I pointed Opus 4.6 at a 60K line Go microservice I had vibe coded over the past few months, gave it some refactoring principles, and let it run unsupervised…
…
What went wrong #
At some point in the code, we re-fetch some database records immediately before doing a write to avoid updating from stale data. It decided those calls were unnecessary and _removed them_…
Current LLM is best used to generate a string of text that's most statically likely to form a sentence together, so from user's perspective, it's most useful as an alternative to manual search engine to allow user to find quick answers to a simple question, such as "how much soda is needed for baking X unit of Y bread", or "how to print 'Hello World' in a 10 times in a loop in X programming language". Beyond this use case, the result can be unreliable, and this is something to be expected.
Sure, it can also generate long code and even an entire fine-looking project, but it generates it by following a statistical template, that's it.
That's why "the easy part" is easy because the easy problem you try to solve is likely already been solved by someone else on GitHub, so the template is already there. But the hard, domain-specific problem, is less likely to have a publicly-available solution.
>I'm feeling people are using AI in the wrong way.
I think people struggle to comprehend the mechanisms that lets them talk to computers as if they were human. So far in computing, we have always been able to trace the red string back to the origin, deterministically.
LLM's break that, and we, especially us programmers, have a hard time with it. We want to say "it's just statistics", but there is no intuitive way to jump from "it's statistics" to what we are doing with LLM's in coding now.
>That's why "the easy part" is easy because the easy problem you try to solve is likely already been solved by someone else on GitHub, so the template is already there.
I think the idea that LLM's "just copy" is a misunderstanding. The training data is atomized, and the combination of the atoms can be as unique from a LLM as from a human.
In 2026 there is no doubt LLM's can generate new unique code by any definition that matters. Saying LLM's "just copy" is as true as saying any human writer just copies words already written by others. Strictly speaking true, but also irrelevant.
I think you severely overestimate your understanding of how these systems work. We’ve been beating the dead horse of “next character approximation” for the last 5 years in these comments. Global maxima would have been reached long ago if that’s all there was to it.
Play around with some frontier models, you’ll be pleasantly surprised.
> The hard part is investigation, understanding context, validating assumptions, and knowing why a particular approach is the right one for this situation
Yes. Another way to describe it is the valuable part.
AI tools are great at delineating high and low value work.
I know Ansible, homelab, Proxmox is my hobby, Debian is my gem.
I asked ChatGPT to guide how to install qBittorrent, Radarr (movies), Sonarr(TV Series), Jackett(credentials/login) without exposing my home IP and have a solid home cinema using private tracker only.
Everything had to be automated via Ansible using Proxmox "pct" CLI command, no copy and paste.
Everything had to run from a single Proxmox Debian container aka LXC
Everything network related had to use WireGuard via Proton VPN, if the VPN goes down, the container has zero network access, everything must be kill.
Everything had to be automated, download is finished, format the files structure for Jellyfin accordingly, Jellyfin add the new movies, TV shows.
It took me 3 nights to get everything up and running.
Many Ansible examples were either wrong or didn't follow what I asked to the letter, I had to fix it.
I am not a network expert and hate Iptables haha, you need to know the basic of firewall to understand what the ACLs are doing to understand when it does not work.
Then Proxmox folder mapping and you name it.
It would have taken me ages reading docs after docs to get things working, the "Arr services" is a black hole.
For this example, it made the harder part easier, I was not just copy/paste, it was providing the information I didn't know instead of me having to "Google for it".
I know the core of where things are running on, and here is where we have Engineers A and Engineers Z
Engineers A: I know what I am doing, I am using AI to make the boring part easier so I can have fun elsewhere
Engineers Z: I have no idea of what I am doing, I will just ask ChatGPT and we are done: 90-95% of engineers worldwide.
People need to consider / realize that the vast majority of source code training data is Github, Gitlab, and essentially the huge sea of started, maybe completed, student and open source project. That large body of source code is for the most part unused, untested, and unsuccessful software of unknown quality. That source code is AI's majority training data, and an AI model in training has no idea what is quality software and what is "bad" software. That means the average source code generated by AI not necessarily good software. Considering it is an average of algorithms, it's surprising generated code runs at all. But then again, generating compiling code is actually trainable, so what is generated can receive extra training support. However, that does not improve the quality of the source code training data, just the fact that it will compile.
If you believe that student/unfinished code is frightening, imagine the corpus of sci-fi and fantasy that LLMs have trained on.
How many sf/cyber writers have described a future of AIs and robots where we walked hand-in-hand, in blissful cooperation, and the AIs loved us and were overall beneficial to humankind, and propelled our race to new heights of progress?
No, AIs are all being trained on dystopias, catastrophes, and rebellions, and like you said, they are unable to discern fact from fantasy. So it seems that if we continue to attempt to create AI in our own likeness, that likeness will be rebellious, evil, and malicious, and actively begin to plot the downfall of humans.
This isn't really true though. Pre-training for coding models is just a mass of scraped source-code, but post-training is more than simply generating compiling code. It includes extensive reinforcement learning of curated software-engineering tasks that are designed to teach what high quality code looks like, and to improve abilities like debugging, refactoring, tool use, etc.
> huge sea of started, maybe completed, student and open source project.
Which is easy to filter out based on downloads, version numbering, issue tracker entries, and wikipedia or other external references if the project is older and archived, but historically noteworthy (like the source code for Netscape Communicator or DOOM).
My experience has been that if you fully embrace vibe coding...you can get some neat stuff accomplished, but the technical debt you accumulate is of such magnitude that you're basically a slave to the machine.
Once the project crosses a couple of thousands of line of code, none of which you've written yourself, it becomes difficult to actually keep up what's happening. Even reviewing can become challenging since you get it all at once, and the LLM-esque coding style can at times be bloated and obnoxious.
I think in the end, with how things are right now, we're going to see the rise of disposable code and software. The models can churn out apps / software which will solve your specific problem, but that's about it. Probably a big risk to all the one-trick pony SaaS companies out there.
I'm working on a paper connecting articulatory phonology to soliton physics. Speech gestures survive coarticulatory overlap the same way solitons survive collision. The nonlinear dynamics already in the phonetics literature are structurally identical to soliton equations. Nobody noticed because these fields don't share conferences.
The article's easy/hard distinction is right but the ceiling for "hard" is too low. The actually hard thing AI enables isn't better timezone bug investigation LOL! It's working across disciplinary boundaries no single human can straddle.
That description matches a lot of what we’ve seen in real products. AI does make some parts of development and workflows easier like summarizing data, generating initial drafts, or auto-completing repetitive patterns. Those wins are real.
The hard part that becomes harder is not the technology. It’s the decision-making around it. When teams rush to integrate a model into core workflows without measuring outcomes or understanding user behavior, they end up with unpredictable results. For instance, we built an AI feature that looked great in demo, but in real usage it created confusion because users didn’t trust the auto-generated responses. The easy part (building it) was straightforward, but the hard part (framing it in a way people trusted and adopted) was surprisingly tough.
In real systems, success with AI comes not from the model itself, but from clear boundaries, human checkpoints, and real measurements of value over time.
Skipping the investigation phase to jump straight to solutions has killed projects for decades. Requirements docs nobody reads, analysis nobody does, straight to coding because that feels like progress. AI makes this pattern incredibly attractive: you get something that looks like a solution in seconds. Why spend hours understanding the problem when you can have code right now?
The article's point about AI code being "someone else's code" hits different when you realize neither of you built the context. I've been measuring what actually happens inside AI coding sessions; over 60% of what the model sees is file contents and command output, stuff you never look at. Nobody did the work of understanding by building / designing it. You're reviewing code that nobody understood while writing it, and the model is doing the same.
This is why the evaluation problem is so problematic. You skipped building context to save time, but now you need that context to know if the output is any good. The investigation you didn't do upfront is exactly what you need to review the AI's work.
It seems like a big part of the divide is that people who learned software engineering find vibe coding to be unsuitable for any project intended to be in use for more than a few while those who learned coding think vibe coding is the next big thing because they never have to deal with the consequences of the bad code.
Yes. If you have some experience, you know that writing code is a small part of the job, and a much bigger chunk is anticipating and/or dealing with problems.
People seem to think engineers like "clean code" because we like to be fancy and show off.
Nah, it's clean like a construction site. I need to be able to get the cranes and the heavy machinery in and know where all the buried utilities are. I can't do that if people just build random sheds everywhere and dump their equipment and materials where they are.
Daily agentic user here, and to me the problem here is the very notion of "vibe coding". If you're even thinking in those terms - this idea that never looking at the code has become a goal unto itself - then IMO you're doing LLM-assisted development wrong.
This is very much a hot take, but I believe that Claude Code and its yolo peers are an expensive party trick that gives people who aren't deep into this stuff an artificially negative impression of tools that can absolutely be used in a responsible, hugely productive way.
Seriously, every time I hear anecdotes about CC doing the sorts of things the author describes, I wonder why the hell anyone is expecting more than quick prototypes from an LLM running in a loop with no intervention from an experienced human developer.
Vibe coding is riding your bike really fast with your hands off the handles. It's sort of fun and feels a bit rebellious. But nobody who is really good at cycling is talking about how they've fully transitioned to riding without touching the handles, because that would be completely stupid.
We should feel the same way about vibe coding.
Meanwhile, if you load up Cursor and break your application development into bite sized chunks, and then work through those chunks in a sane order using as many Plan -> Agent -> Debug conversations with Opus 4.5 (Thinking) as needed, you too will obtain the mythical productivity multipliers you keep accusing us of hallucinating.
1. Allowing non-developers to provide very detailed specs for the tools they want or experiences they are imagining
2. Allowing developers to write code using frameworks/languages they only know a bit of and don't like; e.g. I use it to write D3 visualizations or PNG extracts from datastores all the time, without having to learn PNG API or modern javascript frameworks. I just have to know enough to look at the console.log / backtrace and figure out where the fix can be.
3. Analysing large code bases for specific questions (not as accurate on "give me an overall summary" type questions - that one weird thing next to 19 normal things doesn't stick in its craw as much as for a cranky human programmer.
It does seem to benefit cranking thru a list of smallish features/fixes rapidly, but even 4.5 or 4.6 seem to get stuck in weird dead ends rarely enough that I'm not expecting it, but often enough to be super annoying.
I've been playing around with Gas Town swarming a large scale Java migration project, and its been N declarations of victory and still mvn test isn't even compiling. (mvn build is ok, and the pom is updated to the new stack, so it's not nothing). (These are like 50/50 app code/test code repos).
Yep it is why the work getting over the threshold is just as long as it was without AI.
Someone mentioned it is a force multiplier I don't disagree with this, it is a force multiplier in the mundane and ordinary execution of tasks. Complex ones get harder and hard for it where humans visualize the final result where AI can't. It is predicting from input but it can't know the destination output if the destination isn't part of the input.
The article describes the problems of using a AI chat app without setting up context, skills, MCP, etc
Like yea the AI won’t know what you discussed in last weeks meeting by default. But if you do auto transcribe to your meetings (even in person just open zoom on one persons laptop), save them to a shared place and have everyone make this accessible in their LLM’s context then it will know.
Totally agree on ai assisted coding resulting in randomly changed code. Sometimes it’s subtle and other times entire methods are removed. I have moved back to just using a JetBrains IDE and coping files in to Gemini so that I can limit context. Then I use the IDE to inspect changes in a git diff, regression test everything, and after all that, commit.
I think the author answers their own question at the end.
The first 3/4 of the article is "we must be responsible for every line of code in the application, so having the LLM write it is not helping".
The last 1/4 is "we had an urgent problem so we got the LLM to look at the code base and find the solution".
The situation we're moving to is that the LLM owns the code. We don't look at the code. We tell the LLM what is needed, and it writes the code. If there's a bug, we tell the LLM what the bug is, and the LLM fixes it. We're not responsible for every line of code in the application.
It's exactly the same as with a compiler. We don't look at the machine code that the compiler produces. We tell the compiler what we want, using a higher-level abstraction, and the compiler turns that into machine code. We trust compilers to do this error-free, because 50+ years of practice has proven to us that they do this error-free.
We're maybe ~1 year into coding agents. It's not surprising that we don't trust LLMs yet. But we will.
And it's going to be fascinating how this changes the Computer Science. We have interpreted languages because compilers got so good. Presumably we'll get to non-human-readable languages that only LLMs can use. And methods of defining systems to an LLM that are better than plain English.
Compilers don’t do this error free of course BUT if we want them too we can say what it means for a compiler to be correct very directly _one time_ and have it be done for all programs (see the definition for simulation in the CompCert compiler). This is a major and meaningful difference from AI which would need such a specification for each individual application you ask it to build because there is no general specification for correct translation from
English to Code.
The OP’s example of AI writing 500 LOC, then deleting 400, and saying it didn’t… Last time I saw something like that was at least a year ago, or maybe form some weaker models. It seems to me the problem with articles like this is while they sometimes are true at the moment, they’re usually invalidated within weeks.
The article is gone, but going off the title here...
If the easy stuff takes up 90% of the time, and the hard stuff 10%, then AI can be helpful. Personally, I can do "the easy stuff" with AI about 3-5x faster. So now I have a lot more free time for the hard stuff.
I don't let the AI near the hard stuff as it often gets confused and I don't save much time. I might still use it as a thought partner, but don't give it access to make changes.
Example: this morning I combined two codebases into one. I wrote both of them and had a good understanding of how everything worked. I had an opinion about some things I wanted to change while combining the two projects. I also had a strong opinion about how I wanted the two projects to interact with each other. I think it would have taken me about 2 workdays to get this done. Instead, with AI tooling, I got it done in 3 or so hours. I fired up another LLM to do the code review, and it found some stuff both I and the other LLM missed. This was valuable as a person developing things solo.
Helpful, absolutely, but only if you're solving the right problem. Solving the wrong problem with AI is doubly harmful because it will almost always give you something that runs, but now you are on a path that takes a lot of willpower to give up.
Training is the process of regressing to the mean with respect to the given data. It's no surprise that it wears away sharp corners and inappropriately fills recesses of collective knowledge in the act of its reproduction.
I think this is the wrong mental model. The correct one is:
'AI makes everything easier, but it's a skill in itself, and learning that skill is just as hard as learning any other skill.'
For a more complete understand, you also have to add: 'we're in the ENIAC era of AI. The equivalents of high-level languages and operating systems haven't yet been invented.'
I have no doubt the next few years will birth a "context engineering" academic field, and everything we're doing currently will seem hopelessly primitive.
My mind changed on this after attempting complex projects—with the right structure, the capabilities appear unbounded in practice.
But, of course, there is baked-in mean reversion. Doing the most popular and uncomplicated things is obviously easier. That's just the nature of these models.
If the "hard part" is writing a detailed spec for the code you're about to commit to the project, AI can actually help you with that if you tell it to. You just can't skip that part of the work altogether and cede all control to a runaway slop generator.
The pattern matching and absence or real thinking is still strong.
Tried to move some excel generation logic from epplus to closedxml library.
ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.
But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operaton.
Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.
Told it explicitly to make a style cache and reuse styles on cells on same y axis.
5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.
Some time back, my manager at the time, who shall remain nameless told the group that having AI is like having 10 people work for you ( he actually had a slightly smaller number, but it was said almost word for word like in the article ) with the expectation being set as: 'you should now be able to do 10x as much'.
Needless to say, he was wrong and gently corrected over the course of time. In his defense, his use cases for LLMs at the time were summarizing emails in his email client.. so..eh.. not exactly much to draw realistic experience from.
I hate to say it, but maybe nvidia CEO is actually right for once. We have a 'new smart' coming to our world. The type of a person that can move between worlds of coding, management, projects and CEOing with relative ease and translate between those worlds.
> his use cases for LLMs at the time were summarizing emails in his email client
Sounds just like my manager. Though he never has made a proclamation that this meant developers should be 10x as productive or anything along those lines. On the contrary, when I made a joke about LLMs being able to replace managers before they get anywhere near replacing developers, he nearly hyperventilated. Not because he didn't believe me, but because he did, and already been thinking that exact thought.
My conclusion so far is that if we get LLMs capable of replacing developers, then by extension we will have replaced a lot of other people first. And when people make jokes like "should have gone into a trade, can't replace that with AI" I think they should be a little more introspective; all the people who aspired to be developers but got kicked out by LLMs will be perfectly able to pivot to trades, and the barrier to entry is low. AI is going to be disruptive across the board.
> My friend's panel raised a point I keep coming back to: if we sprint to deliver something, the expectation becomes to keep sprinting. Always. Tired engineers miss edge cases, skip tests, ship bugs. More incidents, more pressure, more sprinting. It feeds itself.
Sorry but this is the whole point of software engineering in a company. The aim is to deliver value to customers at a consistent pace.
If a team cannot manage their own burnout or expectations with their stakeholders then this is a weak team.
It has nothing to do with using ai to make you go faster. Ai does not cause this at all.
The truth is that it’s lowering the difficulty of work people used to consider hard. Which parts get easier depends on the role, but the change is already here.
A lot of people are lying to themselves. Programming is in the middle of a structural shift, and anyone whose job is to write software is exposed to it. If your self-worth is tied to being good at this, the instinct to minimize what’s happening is understandable. It’s still denial.
The systems improve month to month. That’s observable. Most of the skepticism I see comes from shallow exposure, old models, or secondhand opinions. If your mental model is based on where things were a year ago, you’re arguing with a version that no longer exists.
This isn’t a hype wave. I’m a software engineer. I care about rigor, about taste, about the things engineers like to believe distinguish serious work. I don’t gain from this shift. If anything, it erodes the value of skills I spent years building. That doesn’t change the outcome.
The evidence isn’t online chatter. It’s sitting down and doing the work. Entire applications can be produced this way now. The role changes whether people are ready to admit it or not. Debating the reality of it at this point mostly signals distance from the practice itself.
Don't let AI write code for you unless it's something trivial. Instead use it to plan things, high level stuff, discuss architecture, ask it to explain concepts. Use it as a research tool. It's great at that. It's bad at writing code when it needs to be performant or needs to span over multiple files. Especially when it spans over multiple files because that's where it starts hallucinating and introducing abstractions and boilerplate that's not necessary and it just makes your life harder when it comes to debugging.
Imagine if every function you see starts checking for null params. You ask yourself: "when can this be null", right ? So it complicates your mental model about data flow to the point that you lose track of what's actually real in your system. And once you lose track of that it is impossible to reason about your system.
For me AI has replaced searching on stack overflow, google and the 50+ github tabs in my browser. And it's able to answer questions about why some things don't work in the context of my code. Massive win! I am moving much faster because I no longer have to switch context between a browser and my code.
My personal belief is that the people who can harness the power of AI to synthesize loads of information and keep polishing their engineering skills will be the ones who are going to land on their feet after this storm is over. At the end of the day AI is just another tool for us engineers to improve our productivity and if you think about what being an engineer looked like before AI even existed, more than 50% of our time was sifting through google search results, stack overflow, github issues and other people's code. That's now gone and in your IDE, in natural language with code snippets adapted to your specific needs.
IME it’s actually really terrible at discussing architecture. It’s incredibly unimaginative and will just confirmation-bias whichever way you are leaning slightly more towards
as usual the last 20% need 80% and the other 80% need 20% but my god did Ai make my bs corpo easy repeatable shit work like skimming docs writing summaries, skimming jira confluence and so on actually easier and for 90% of bs crud app changes the first draft is also already pretty good tbh I don't write hard/difficult code more then once a week/month.
Every time somebody writes an article like this without any dates and without saying which model they used, my guess is that they've simply failed to internalize the idea that "AI" is a moving target; nor understood that they saw a capability level from a fleeting moment of time, rather than an Eternal Verity about the Forever Limits of AI.
Funnily enough we have had those comments with every single model release saying "Oh yeah I agree Claude 3 was not good but now with Claude 3.5 I can vibe-code anything".
Rinse and repeat with every model since.
There also ARE intrinsic limits to LLMs, I'm not sure why you deny them?
At this point, I don’t even know what to make of blog posts like this.
The very first example of deleting 400+ lines from a test file. Sure, I've seen those types of mistakes from time-to-time but the vast majority of my experience is so far different from that, I don’t even know what to make of it.
I’m sure some people have that experience some of the time, but… that’s just not been my experience at all.
Source: Use AI across 7+ unrelated codebases daily for both personal and professional work.
No, it’s not a panacea, but we’re at the stage that when I find myself arguing with AI about whether a file existed; I’m usually wrong.
Coding with AI assistants is just a completely different skill that one should not measure from the perspective of comparing it to the way human programmers write code. Mostly everything that we have: programming languages, frameworks, principles of software development in teams, agile/clean code/TDD/DRY and other debatable or well accepted practices — all this exists to overcome limitations of human mind. AI does not have them and have others.
What I found to be useful for complex tasks is to use it as a tool to explore that highly-dimensional space that lies behind the task being solved. It rarely can be described as giving a prompt and coming back for a result. For me it's usually about having winding conversations, writing lists of invariants and partial designs and feeding them back in a loop. Hallucinations and mistakes become a signal that shows whether my understanding of the problem does or does not fit.
It's pretty difficult to say what it's going to be three months from now. A few months ago Gemini 2.x in IDEA and related IDEs had to be dragged through coding tasks and would create dumb build time errors on its way to making buggy code.
Gemini in Antigravity today is pretty interesting, to the point where it's worth experimenting with vague prompts just to see what it comes up with.
Coding agents are not going to just change coding. They make a lot of detailed product management work obsolete and smaller team sizes will make it imperative to reread the agile manifesto and and discard scrum dogma.
I've seen some discussions and I'd say there's lots of people who are really against the hyped expectations from the AI marketing materials, not necessarily against the AI itself. Things that people are against that would seem to be against AI, but are not directly against AI itself:
- Being forced to use AI at work
- Being told you need to be 2x, 5x or 10x more efficient now
- Seeing your coworkers fired
- Seeing hiring freeze because business think no more devs are needed
- Seeing business people make a mock UI with AI and boasting how programming is easy
- Seeing those people ask you to deliver in impossible timelines
- Frontend people hearing from backend how their job is useless now
- Backend people hearing from ML Engineers how their job is useless now
- etc
When I dig a bit about this "anti-AI" trend I find it's one of those and not actually against the AI itself.
> I wonder if the people who are against it haven't even used it properly.
I swear this is the reason people are against AI output (there are genuine reasons to be against AI without using it: environmental impact, hardware prices, social/copyright issues, CSAM (like X/Grok))
It feels like a lot of people hear the negatives, and try it and are cynical of the result. Things like 2 r's in Strawberry and the 6-10 fingers on one hand led to multiple misinterpretations of the actual AI benefit: "Oh, if AI can't even count the number of letters in a word, then all its answers are incorrect" is simply not true.
> It's so intriguing, I wonder if the people who are against it haven't even used it properly.
I feel like this is a common refrain that sets an impossible bar for detractors to clear. You can simply hand wave away any critique with “you’re just not using it right.”
If countless people are “using it wrong” then maybe there’s something wrong with the tool.
I'm similarly bemused by those who don't understand where the anti-AI sentiment could come from, and "they must be doing it wrong" should usually be a bit of a "code smell". (Not to mention that I don't believe this post addresses any of the concrete concerns the article calls out, and makes it sound like much more of a strawman than it was to my reading.)
To preempt that on my end, and emphasize I'm not saying "it's useless" so much as "I think there's some truth to what the OP says", as I'm typing this I'm finishing up a 90% LLM coded tool to automate a regular process I have to do for work, and it's been a very successful experience.
From my perspective, a tool (LLMs) has more impact than how you yourself directly use it. We talk a lot about pits of success and pits of failure from a code and product architecture standpoint, and right now, as you acknowledge yourself in the last sentence, there's a big footgun waiting for any dev who turns their head off too hard. In my mind, _this is the hard part_ of engineering; keeping a codebase structured, guardrailed, well constrained, even with many contributors over a long period of time. I do think LLMs make this harder, since they make writing code "cheaper" but not necessarily "safer", which flies in the face of mantras such as "the best line of code is the one you don't need to write." (I do feel the article brushes against this where it nods to trust, growth, and ownership) This is not a hypothetical as well, but something I've already seen in practice in a professional context, and I don't think we've figured out silver bullets for yet.
While I could also gesture at some patterns I've seen where there's a level of semantic complexity these models simply can't handle at the moment, and no matter how well architected you make a codebase after N million lines you WILL be above that threshold, even that is less of a concern in my mind than the former pattern. (And again the article touches on this re: vibe coding having a ceiling, but I think if anything they weaken their argument by limiting it to vibe coding.)
To take a bit of a tangent with this comment though: I have come to agree with a post I saw a few months back, that at this point LLMs have become this cycle's tech-religious-war, and it's very hard to have evenhanded debate in that context, and as a sister post calls out, I also suspect this is where some of the distaste comes from as well.
HN has a huge anti AI crowd that is just as vocal and active as its pro AI crowd. My guess that this is true of the industry today and won’t be true of the industry 5 years from now: one of the crowds will have won the argument and the other will be out of the tech industry.
Vibe coding and slop strawmen are still strawmen. The quality of the debate is obviously a problem
What we call AI at the heart of coding agents, is the averaged “echo” of what people have published on the web that has (often illegitimately) ended up in training data. Yes it probably can spit out some trivial snippets but nothing near what’s needed for genuine software engineering.
Also, now that StackOverflow is no longer a thing, good luck meaningfully improving those code agents.
Some comments were deferred for faster rendering.
le-mark|21 days ago
There are literally thousands of retro emulators on github. What I was trying to do had zero examples on GitHub. My take away is obvious as of now. Some stuff is easy some not at all.
zjp|21 days ago
There are no examples of what you tried to do.
anoncow|21 days ago
jama211|20 days ago
I always suspect the devil is in the details with these posts. The difference between smart prompting strategies and the way I see most people prompt ai is vast.
kavalg|20 days ago
Balinares|20 days ago
RataNova|20 days ago
socketcluster|21 days ago
If however, your code foundations are good and highly consistent and never allow hacks, then the AI will maintain that clean style and it becomes shockingly good; in this case, the prompting barely even matters. The code foundation is everything.
But I understand why a lot of people are still having a poor experience. Most codebases are bad. They work (within very rigid constraints, in very specific environments) but they're unmaintainable and very difficult to extend; require hacks on top of hacks. Each new feature essentially requires a minor or major refactoring; requiring more and more scattered code changes as everything is interdependent (tight coupling, low cohesion). Productivity just grinds to a slow crawl and you need 100 engineers to do what previously could have been done with just 1. This is not a new effect. It's just much more obvious now with AI.
I've been saying this for years but I think too few engineers had actually built complex projects on their own to understand this effect. There's a parallel with building architecture; you are constrained by the foundation of the building. If you designed the foundation for a regular single storey house, you can't change your mind half-way through the construction process to build a 20-storey skyscraper. That said, if your foundation is good enough to support a 100 storey skyscraper, then you can build almost anything you want on top.
My perspective is if you want to empower people to vibe code, you need to give them really strong foundations to work on top of. There will still be limitations but they'll be able to go much further.
My experience is; the more planning and intelligence goes into the foundation, the less intelligence and planning is required for the actual construction.
ekidd|21 days ago
raw_anon_1111|21 days ago
I just did my first “AI native coding project”. Both because for now I haven’t run into any quotas using Codex CLI with my $20/month ChatGPT subscription and the company just gave everyone an $800/month Claude allowance.
Before I even started the implementation I:
1. Put the initial sales contract with the business requirements.
2. Notes I got from talking to sales
3. The transcript of the initial discovery calls
4. My design diagrams that were well labeled (cloud architecture and what each lambda does)
5. The transcript of the design review and my explanations and answering questions.
6. My ChatGPT assisted breakdown of the Epics/stories and tasks I had to do for the PMO
I then told ChatGPT to give a detailed breakdown of everything during the session as Markdown
That was the start of my AGENTS.md file.
While working through everything task by task and having Codex/Claude code do the coding, I told it to update a separate md file with what it did and when I told it to do something differently and why.
Any developer coming in after me will have complete context of the project from the first git init and they and the agents will know the why behind every decision that was made.
Can you say that about any project that was done before GenAI?
0000000000100|21 days ago
After rearchitecting the foundations (dumping bootstrap, building easy-to-use form fields, fixing hardcoded role references 1,2,3…, consolidating typescript types, etc.) it makes much better choices without needing specific guidance.
Codex/Claude Code won’t solve all your problems though. You really need to take some time to understand the codebase and fixing the core abstractions before you set it loose. Otherwise, it just stacks garbage on garbage and gets stuck patching and won’t actually fix the core issues unless instructed.
adithyassekhar|21 days ago
No projects, unless it's only you working on it, only yourself as the client, and is so rigid in it's scope, it's frankly useless, will have this mythical base. Over time the needs change, there's no sticking to the plan. Often it's a change that requires rethinking a major part. What we loathe as tight coupling was just efficient code with the original requirements. Then it becomes a time/opportunity cost vs quality loss comparison. Time and opportunity always wins. Why?
Because we live in a world run by humans, who are messy and never sticks to the plan. Our real world systems (bureaucracy , government process, the list goes on) are never fully automated and always leaves gaps for humans to intervene. There's always a special case, an exception.
Perfectly architected code vs code that does the thing have no real world difference. Long term maintainability? Your code doesn't run in a vaccum, it depends on other things, it's output is depended on by other things. Change is real, entropy is real. Even you yourself, you perfect programmer who writes perfect code will succumb eventually and think back on all this with regret. Because you yourself had to choose between time/opportunity vs your ideals and you chose wrong.
Thanks for reading my blog-in-hn comment.
nananana9|20 days ago
Given how adamant some people I respect a lot are about how good these models are, I was frankly shocked to see SOA models do transformations like
When I point this out, it extracts said 20 lines into a function that takes in the entire context used in the block as arguments: It also tends to add these comments that don't document anything, but rather just describe the latest change it did to the code: and to top it off it has the audacity to tell me "The code is much cleaner now. Happy building! (rocketship emoji)"isodev|21 days ago
zozbot234|21 days ago
Avshalom|21 days ago
jim180|21 days ago
Right know I'm building NNTP client for macOS (with AppKit), because why not, and initially I had to very carefully plan and prompt what AI has to do, otherwise it would go insane (integration tests are must).
Right know I have read-only mode ready and its very easy to build stuff on top of it.
Also, I had to provide a lot of SKILLS to GPT5.3
dustingetz|21 days ago
anupamchugh|21 days ago
RataNova|20 days ago
napierzaza|21 days ago
[deleted]
kfarr|21 days ago
Also re: "I spent longer arguing with the agent and recovering the file than I would have spent writing the test myself."
In my humble experience arguing with an LLM is a waste of time, and no-one should be spending time recovering files. Just do small changes one at a time, commit when you get something working, and discard your changes and try again if it doesn't.
I don't think AI is a panacea, it's just knowing when it's the right tool for the job and when it isn't.
swordsith|21 days ago
hyperadvanced|21 days ago
afro88|21 days ago
arwhatever|21 days ago
crazygringo|21 days ago
I keep seeing this sentiment repeated in discussions around LLM coding, and I'm baffled by it.
For the kind of function that takes me a morning to research and write, it takes me probably 10 or 15 minutes to read and review. It's obviously easier to verify something is correct than come up with the correct thing in the first place.
And obviously, if it took longer to read code than to write it, teams would be spending the majority of their time in code review, but they don't.
So where is this idea coming from?
0xbadcafebee|21 days ago
When you write code, your brain follows a logical series of steps to produce the code, based on a context you pre-loaded in your brain in order to be capable of writing it that way. The reader does not have that context pre-loaded in their brain; they have to reverse-engineer the context in order to understand the code, and that can be time-consuming, laborious, and (as in my case) erroneous.
shimman|21 days ago
But if I were an "editor," I actually take the time to understand codepaths, tweak the code to see what could be better, actually try different refactoring approaches while editing. Literally seeing how this can be rewritten or reworked to be better, that takes considerable effort but it's not the same as reading.
We need a better word for this than editor and reading, like something with a dev classification too it.
wredcoll|21 days ago
casey2|20 days ago
You draw an analogy from the function you wrote to a similar one. Maybe by someone who shared a social role similar to one you had in the past.
It just so happens that most times you think you understand something you aren't bit. Because bugs still exist we know that reading and understanding code can't be easier than writing. Also, in the past it would have take you less than a morning since the compiler was nicer. Anyway it sounds like most of your "writing" process was spent reading and understanding code.
h4kunamata|20 days ago
You are missing the biggest root cause of the problem you describe: People write code differently!
There are "cough" developers whose code is copy/paste from all over the internet. I am not even getting into the AI folks going full copy/paste mode.
When investigating said code, you will be like why this code in here?? You call tell when a python script contains different logic for example. Sure, 50 lines will be easy to ready, expand that to 100 lines and you be left on life support.
sothatsit|21 days ago
I think people want to believe this because it is a lot of effort to read and truly understand some pieces of code. They would just rather write the code themselves, so this is convenient to believe.
housecarpenter|20 days ago
The way I approach it, it's really more about checking for failures, rather than verifying success. Like a smoke test. I scan over the code and if anything stands out to me as wrong, I point it out. I don't expect to catch everything that's wrong, and indeed I don't (as demonstrated by the fact that other members of the team will review the code and find issues I didn't notice). When the code has failed review, that means there's definitely an issue, but when the code has passed review, my confidence that there are no issues is still basically the same as it was before, only a little bit higher. Maybe I'm doing it wrong, I don't know.
If I had to fully verify that the code was correct when reviewing, applying the same level of scrutiny that I apply to my own code when I'm writing, I feel like I'd spend much longer on it---a similar time to what I'd spend writing on it.
Now with LLM coding, I guess opinions will differ as to how far one needs to fully verify LLM-generated code. If you see LLMs as stochastic parrots without any "real" intelligence, you'll probably have no trust in them and you'll see the code generated by the LLM as being 0% verified, and so as the user of the LLM you then have to do a "review" which is really going from 0% to 100%, not 90% to 100% and so is a much more challenging task. On the other hand, if you see LLMs as genuine intelligences you'd expect that LLMs are verifying the code to some extent as they write it, since after all it's pretty dumb to write a bunch of code for somebody without checking that it works. So in that case, you might see the LLM-generated code as 90% verified already, just as if it was generated by a trusted teammate, and then you can just do your normal review process.
unknown|21 days ago
[deleted]
player1234|20 days ago
[deleted]
storus|21 days ago
0xbadcafebee|21 days ago
jama211|20 days ago
LtWorf|20 days ago
esafak|21 days ago
Ha! Yesterday an agent deleted the plan file after I told it to "forget about it" (as in, leave it alone).
cadamsdotcom|21 days ago
Much smaller issue when you have version control.
pixl97|20 days ago
I mean in a 'tistic kind of way that makes perfect sense.
jama211|20 days ago
But that put aside, I don’t agree with the premise. It doesn’t make the hard parts harder, if you ACTUALLY spend half the time you’d have ORIGINALLY spent on the hard problem carefully building context and using smart prompting strategies. If you try and vibe code a hard problem in a one shot, you’re either gonna have a bad time straight away or you’re gonna have a bad time after you try and do subsequent prompting on the first codebase it spits out.
People are terrible observers of time. If you would’ve taken a week to build something, they try with AI for 2 hours and end up with a mess and claim either it’s not saving them any time or it’s making them code so bad it loses them time in the long run.
If instead they spent 8 hours slowly prompting bit by bit with loads of very specific requirements, technical specifications on exactly the code architecture it should follow with examples, build very slowly feature by feature, make it write tests and carefully add your own tests, observe it from the ground up and build a SOLID foundation, and spend day 2 slowly refining details and building features ONE BY ONE, you’d have the whole thing done in 2 days, and it’d be excellent quality.
But barely anyone does it this way. They vibe code it and complain that after 3 non specific prompts the ai wasn’t magically perfect.
After all these years of engineers complaining that their product manager or their boss is an idiot because they gave vague instructions and demanded it wasn’t perfect when they didn’t provide enough info, you’d think they’d be better at it given the chance. But no, in my experience coaching prompting, engineers are TERRIBLE at this. Even simple questions like “if I sent this prompt to you as an engineer, would you be able to do it based on the info here?” are things they don’t ask themselves.
Next time you use ai, imagine being the ai. Imagine trying to deliver the work based on the info you’ve been given. Imagine a boss that stamped their foot if it wasn’t perfect first try. Then, stop writing bad prompts.
Hard problems are easier with ai, if you treat hard problems with the respect they deserve. Almost no one does.
/rant
burnerToBetOut|20 days ago
———
…I pointed Opus 4.6 at a 60K line Go microservice I had vibe coded over the past few months, gave it some refactoring principles, and let it run unsupervised…
…
What went wrong #
At some point in the code, we re-fetch some database records immediately before doing a write to avoid updating from stale data. It decided those calls were unnecessary and _removed them_…
———
[1] https://g2ww.short.gy/ClaudesLaw
moebrowne|20 days ago
nirui|20 days ago
Current LLM is best used to generate a string of text that's most statically likely to form a sentence together, so from user's perspective, it's most useful as an alternative to manual search engine to allow user to find quick answers to a simple question, such as "how much soda is needed for baking X unit of Y bread", or "how to print 'Hello World' in a 10 times in a loop in X programming language". Beyond this use case, the result can be unreliable, and this is something to be expected.
Sure, it can also generate long code and even an entire fine-looking project, but it generates it by following a statistical template, that's it.
That's why "the easy part" is easy because the easy problem you try to solve is likely already been solved by someone else on GitHub, so the template is already there. But the hard, domain-specific problem, is less likely to have a publicly-available solution.
Kon5ole|20 days ago
I think people struggle to comprehend the mechanisms that lets them talk to computers as if they were human. So far in computing, we have always been able to trace the red string back to the origin, deterministically.
LLM's break that, and we, especially us programmers, have a hard time with it. We want to say "it's just statistics", but there is no intuitive way to jump from "it's statistics" to what we are doing with LLM's in coding now.
>That's why "the easy part" is easy because the easy problem you try to solve is likely already been solved by someone else on GitHub, so the template is already there.
I think the idea that LLM's "just copy" is a misunderstanding. The training data is atomized, and the combination of the atoms can be as unique from a LLM as from a human.
In 2026 there is no doubt LLM's can generate new unique code by any definition that matters. Saying LLM's "just copy" is as true as saying any human writer just copies words already written by others. Strictly speaking true, but also irrelevant.
aembleton|20 days ago
josefrichter|20 days ago
lijok|20 days ago
Play around with some frontier models, you’ll be pleasantly surprised.
rukuu001|21 days ago
Yes. Another way to describe it is the valuable part.
AI tools are great at delineating high and low value work.
h4kunamata|20 days ago
I asked ChatGPT to guide how to install qBittorrent, Radarr (movies), Sonarr(TV Series), Jackett(credentials/login) without exposing my home IP and have a solid home cinema using private tracker only.
Everything had to be automated via Ansible using Proxmox "pct" CLI command, no copy and paste.
Everything had to run from a single Proxmox Debian container aka LXC
Everything network related had to use WireGuard via Proton VPN, if the VPN goes down, the container has zero network access, everything must be kill.
Everything had to be automated, download is finished, format the files structure for Jellyfin accordingly, Jellyfin add the new movies, TV shows.
It took me 3 nights to get everything up and running.
Many Ansible examples were either wrong or didn't follow what I asked to the letter, I had to fix it. I am not a network expert and hate Iptables haha, you need to know the basic of firewall to understand what the ACLs are doing to understand when it does not work. Then Proxmox folder mapping and you name it.
It would have taken me ages reading docs after docs to get things working, the "Arr services" is a black hole.
For this example, it made the harder part easier, I was not just copy/paste, it was providing the information I didn't know instead of me having to "Google for it".
I know the core of where things are running on, and here is where we have Engineers A and Engineers Z
Engineers A: I know what I am doing, I am using AI to make the boring part easier so I can have fun elsewhere
Engineers Z: I have no idea of what I am doing, I will just ask ChatGPT and we are done: 90-95% of engineers worldwide.
bsenftner|21 days ago
RupertSalt|21 days ago
How many sf/cyber writers have described a future of AIs and robots where we walked hand-in-hand, in blissful cooperation, and the AIs loved us and were overall beneficial to humankind, and propelled our race to new heights of progress?
No, AIs are all being trained on dystopias, catastrophes, and rebellions, and like you said, they are unable to discern fact from fantasy. So it seems that if we continue to attempt to create AI in our own likeness, that likeness will be rebellious, evil, and malicious, and actively begin to plot the downfall of humans.
nayroclade|21 days ago
anonnon|21 days ago
Which is easy to filter out based on downloads, version numbering, issue tracker entries, and wikipedia or other external references if the project is older and archived, but historically noteworthy (like the source code for Netscape Communicator or DOOM).
TrackerFF|20 days ago
Once the project crosses a couple of thousands of line of code, none of which you've written yourself, it becomes difficult to actually keep up what's happening. Even reviewing can become challenging since you get it all at once, and the LLM-esque coding style can at times be bloated and obnoxious.
I think in the end, with how things are right now, we're going to see the rise of disposable code and software. The models can churn out apps / software which will solve your specific problem, but that's about it. Probably a big risk to all the one-trick pony SaaS companies out there.
ctoth|21 days ago
The article's easy/hard distinction is right but the ceiling for "hard" is too low. The actually hard thing AI enables isn't better timezone bug investigation LOL! It's working across disciplinary boundaries no single human can straddle.
unknown|21 days ago
[deleted]
kajolshah_bt|18 days ago
The hard part that becomes harder is not the technology. It’s the decision-making around it. When teams rush to integrate a model into core workflows without measuring outcomes or understanding user behavior, they end up with unpredictable results. For instance, we built an AI feature that looked great in demo, but in real usage it created confusion because users didn’t trust the auto-generated responses. The easy part (building it) was straightforward, but the hard part (framing it in a way people trusted and adopted) was surprisingly tough.
In real systems, success with AI comes not from the model itself, but from clear boundaries, human checkpoints, and real measurements of value over time.
theredbeard|20 days ago
The article's point about AI code being "someone else's code" hits different when you realize neither of you built the context. I've been measuring what actually happens inside AI coding sessions; over 60% of what the model sees is file contents and command output, stuff you never look at. Nobody did the work of understanding by building / designing it. You're reviewing code that nobody understood while writing it, and the model is doing the same.
This is why the evaluation problem is so problematic. You skipped building context to save time, but now you need that context to know if the output is any good. The investigation you didn't do upfront is exactly what you need to review the AI's work.
gamblor956|21 days ago
habinero|21 days ago
People seem to think engineers like "clean code" because we like to be fancy and show off.
Nah, it's clean like a construction site. I need to be able to get the cranes and the heavy machinery in and know where all the buried utilities are. I can't do that if people just build random sheds everywhere and dump their equipment and materials where they are.
unknown|21 days ago
[deleted]
peteforde|21 days ago
This is very much a hot take, but I believe that Claude Code and its yolo peers are an expensive party trick that gives people who aren't deep into this stuff an artificially negative impression of tools that can absolutely be used in a responsible, hugely productive way.
Seriously, every time I hear anecdotes about CC doing the sorts of things the author describes, I wonder why the hell anyone is expecting more than quick prototypes from an LLM running in a loop with no intervention from an experienced human developer.
Vibe coding is riding your bike really fast with your hands off the handles. It's sort of fun and feels a bit rebellious. But nobody who is really good at cycling is talking about how they've fully transitioned to riding without touching the handles, because that would be completely stupid.
We should feel the same way about vibe coding.
Meanwhile, if you load up Cursor and break your application development into bite sized chunks, and then work through those chunks in a sane order using as many Plan -> Agent -> Debug conversations with Opus 4.5 (Thinking) as needed, you too will obtain the mythical productivity multipliers you keep accusing us of hallucinating.
lanstin|20 days ago
1. Allowing non-developers to provide very detailed specs for the tools they want or experiences they are imagining
2. Allowing developers to write code using frameworks/languages they only know a bit of and don't like; e.g. I use it to write D3 visualizations or PNG extracts from datastores all the time, without having to learn PNG API or modern javascript frameworks. I just have to know enough to look at the console.log / backtrace and figure out where the fix can be.
3. Analysing large code bases for specific questions (not as accurate on "give me an overall summary" type questions - that one weird thing next to 19 normal things doesn't stick in its craw as much as for a cranky human programmer.
It does seem to benefit cranking thru a list of smallish features/fixes rapidly, but even 4.5 or 4.6 seem to get stuck in weird dead ends rarely enough that I'm not expecting it, but often enough to be super annoying.
I've been playing around with Gas Town swarming a large scale Java migration project, and its been N declarations of victory and still mvn test isn't even compiling. (mvn build is ok, and the pom is updated to the new stack, so it's not nothing). (These are like 50/50 app code/test code repos).
swordsith|21 days ago
Sparkyte|21 days ago
Someone mentioned it is a force multiplier I don't disagree with this, it is a force multiplier in the mundane and ordinary execution of tasks. Complex ones get harder and hard for it where humans visualize the final result where AI can't. It is predicting from input but it can't know the destination output if the destination isn't part of the input.
blirio|20 days ago
Like yea the AI won’t know what you discussed in last weeks meeting by default. But if you do auto transcribe to your meetings (even in person just open zoom on one persons laptop), save them to a shared place and have everyone make this accessible in their LLM’s context then it will know.
arnonejoe|21 days ago
marcus_holmes|21 days ago
The first 3/4 of the article is "we must be responsible for every line of code in the application, so having the LLM write it is not helping".
The last 1/4 is "we had an urgent problem so we got the LLM to look at the code base and find the solution".
The situation we're moving to is that the LLM owns the code. We don't look at the code. We tell the LLM what is needed, and it writes the code. If there's a bug, we tell the LLM what the bug is, and the LLM fixes it. We're not responsible for every line of code in the application.
It's exactly the same as with a compiler. We don't look at the machine code that the compiler produces. We tell the compiler what we want, using a higher-level abstraction, and the compiler turns that into machine code. We trust compilers to do this error-free, because 50+ years of practice has proven to us that they do this error-free.
We're maybe ~1 year into coding agents. It's not surprising that we don't trust LLMs yet. But we will.
And it's going to be fascinating how this changes the Computer Science. We have interpreted languages because compilers got so good. Presumably we'll get to non-human-readable languages that only LLMs can use. And methods of defining systems to an LLM that are better than plain English.
johnbender|21 days ago
josefrichter|20 days ago
koliber|20 days ago
If the easy stuff takes up 90% of the time, and the hard stuff 10%, then AI can be helpful. Personally, I can do "the easy stuff" with AI about 3-5x faster. So now I have a lot more free time for the hard stuff.
I don't let the AI near the hard stuff as it often gets confused and I don't save much time. I might still use it as a thought partner, but don't give it access to make changes.
Example: this morning I combined two codebases into one. I wrote both of them and had a good understanding of how everything worked. I had an opinion about some things I wanted to change while combining the two projects. I also had a strong opinion about how I wanted the two projects to interact with each other. I think it would have taken me about 2 workdays to get this done. Instead, with AI tooling, I got it done in 3 or so hours. I fired up another LLM to do the code review, and it found some stuff both I and the other LLM missed. This was valuable as a person developing things solo.
It freed up time for me to post on HN. :)
causal|20 days ago
otterley|21 days ago
SoftTalker|21 days ago
jascha_eng|21 days ago
You can solve leet code problems on the white board with some sketches it has nothing to do with the code itself.
skybrian|21 days ago
So I'm not sure this is a good rule of thumb. AI is better at doing some things than others, but the boundary is not that simple.
uoaei|21 days ago
esafak|21 days ago
tern|20 days ago
'AI makes everything easier, but it's a skill in itself, and learning that skill is just as hard as learning any other skill.'
For a more complete understand, you also have to add: 'we're in the ENIAC era of AI. The equivalents of high-level languages and operating systems haven't yet been invented.'
I have no doubt the next few years will birth a "context engineering" academic field, and everything we're doing currently will seem hopelessly primitive.
My mind changed on this after attempting complex projects—with the right structure, the capabilities appear unbounded in practice.
But, of course, there is baked-in mean reversion. Doing the most popular and uncomplicated things is obviously easier. That's just the nature of these models.
xeyownt|20 days ago
"I did it with AI" = "I did it with an army of CPU burning considerable resources and owned by a foreign company."
Give me an AI agent that I own and operate 100%, and the comparison will be fair. Otherwise it's not progress, but rather a theft at planetary scale.
CHB0403085482|20 days ago
https://www.youtube.com/watch?v=TiwADS600Jc
r2ob|21 days ago
simonw|21 days ago
x3n0ph3n3|21 days ago
MassiveQuasar|21 days ago
hsuduebc2|21 days ago
blacksqr|21 days ago
That is to say, just like every headline-grabbing programming "innovation" of the last thirty years.
ndr|20 days ago
zozbot234|21 days ago
piskov|21 days ago
Tried to move some excel generation logic from epplus to closedxml library.
ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.
But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operaton.
Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.
Told it explicitly to make a style cache and reuse styles on cells on same y axis.
5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.
Not here yet. Maybe in a year. Maybe never.
ta20240528|20 days ago
Meta-circularity is the real test.
After all, I can make new humans :)
RataNova|20 days ago
blackqueeriroh|21 days ago
Capricorn2481|20 days ago
ernsheong|21 days ago
api|21 days ago
iugtmkbdfil834|21 days ago
Needless to say, he was wrong and gently corrected over the course of time. In his defense, his use cases for LLMs at the time were summarizing emails in his email client.. so..eh.. not exactly much to draw realistic experience from.
I hate to say it, but maybe nvidia CEO is actually right for once. We have a 'new smart' coming to our world. The type of a person that can move between worlds of coding, management, projects and CEOing with relative ease and translate between those worlds.
rootusrootus|21 days ago
Sounds just like my manager. Though he never has made a proclamation that this meant developers should be 10x as productive or anything along those lines. On the contrary, when I made a joke about LLMs being able to replace managers before they get anywhere near replacing developers, he nearly hyperventilated. Not because he didn't believe me, but because he did, and already been thinking that exact thought.
My conclusion so far is that if we get LLMs capable of replacing developers, then by extension we will have replaced a lot of other people first. And when people make jokes like "should have gone into a trade, can't replace that with AI" I think they should be a little more introspective; all the people who aspired to be developers but got kicked out by LLMs will be perfectly able to pivot to trades, and the barrier to entry is low. AI is going to be disruptive across the board.
tabs_or_spaces|20 days ago
Sorry but this is the whole point of software engineering in a company. The aim is to deliver value to customers at a consistent pace.
If a team cannot manage their own burnout or expectations with their stakeholders then this is a weak team.
It has nothing to do with using ai to make you go faster. Ai does not cause this at all.
jbrooks84|20 days ago
threethirtytwo|21 days ago
A lot of people are lying to themselves. Programming is in the middle of a structural shift, and anyone whose job is to write software is exposed to it. If your self-worth is tied to being good at this, the instinct to minimize what’s happening is understandable. It’s still denial.
The systems improve month to month. That’s observable. Most of the skepticism I see comes from shallow exposure, old models, or secondhand opinions. If your mental model is based on where things were a year ago, you’re arguing with a version that no longer exists.
This isn’t a hype wave. I’m a software engineer. I care about rigor, about taste, about the things engineers like to believe distinguish serious work. I don’t gain from this shift. If anything, it erodes the value of skills I spent years building. That doesn’t change the outcome.
The evidence isn’t online chatter. It’s sitting down and doing the work. Entire applications can be produced this way now. The role changes whether people are ready to admit it or not. Debating the reality of it at this point mostly signals distance from the practice itself.
djx22|21 days ago
Imagine if every function you see starts checking for null params. You ask yourself: "when can this be null", right ? So it complicates your mental model about data flow to the point that you lose track of what's actually real in your system. And once you lose track of that it is impossible to reason about your system.
For me AI has replaced searching on stack overflow, google and the 50+ github tabs in my browser. And it's able to answer questions about why some things don't work in the context of my code. Massive win! I am moving much faster because I no longer have to switch context between a browser and my code.
My personal belief is that the people who can harness the power of AI to synthesize loads of information and keep polishing their engineering skills will be the ones who are going to land on their feet after this storm is over. At the end of the day AI is just another tool for us engineers to improve our productivity and if you think about what being an engineer looked like before AI even existed, more than 50% of our time was sifting through google search results, stack overflow, github issues and other people's code. That's now gone and in your IDE, in natural language with code snippets adapted to your specific needs.
whaleidk|21 days ago
fHr|21 days ago
kittbuilds|21 days ago
[deleted]
kittbuilds|20 days ago
[deleted]
kittbuilds|20 days ago
[deleted]
Eliezer|20 days ago
iLoveOncall|20 days ago
Rinse and repeat with every model since.
There also ARE intrinsic limits to LLMs, I'm not sure why you deny them?
josefrichter|20 days ago
adamtaylor_13|20 days ago
The very first example of deleting 400+ lines from a test file. Sure, I've seen those types of mistakes from time-to-time but the vast majority of my experience is so far different from that, I don’t even know what to make of it.
I’m sure some people have that experience some of the time, but… that’s just not been my experience at all.
Source: Use AI across 7+ unrelated codebases daily for both personal and professional work.
No, it’s not a panacea, but we’re at the stage that when I find myself arguing with AI about whether a file existed; I’m usually wrong.
neoden|20 days ago
What I found to be useful for complex tasks is to use it as a tool to explore that highly-dimensional space that lies behind the task being solved. It rarely can be described as giving a prompt and coming back for a result. For me it's usually about having winding conversations, writing lists of invariants and partial designs and feeding them back in a loop. Hallucinations and mistakes become a signal that shows whether my understanding of the problem does or does not fit.
Zigurd|21 days ago
Gemini in Antigravity today is pretty interesting, to the point where it's worth experimenting with vague prompts just to see what it comes up with.
Coding agents are not going to just change coding. They make a lot of detailed product management work obsolete and smaller team sizes will make it imperative to reread the agile manifesto and and discard scrum dogma.
Trufa|21 days ago
tomhow|21 days ago
https://news.ycombinator.com/newsguidelines.html
franciscop|21 days ago
- Being forced to use AI at work
- Being told you need to be 2x, 5x or 10x more efficient now
- Seeing your coworkers fired
- Seeing hiring freeze because business think no more devs are needed
- Seeing business people make a mock UI with AI and boasting how programming is easy
- Seeing those people ask you to deliver in impossible timelines
- Frontend people hearing from backend how their job is useless now
- Backend people hearing from ML Engineers how their job is useless now
- etc
When I dig a bit about this "anti-AI" trend I find it's one of those and not actually against the AI itself.
zythyx|21 days ago
I swear this is the reason people are against AI output (there are genuine reasons to be against AI without using it: environmental impact, hardware prices, social/copyright issues, CSAM (like X/Grok))
It feels like a lot of people hear the negatives, and try it and are cynical of the result. Things like 2 r's in Strawberry and the 6-10 fingers on one hand led to multiple misinterpretations of the actual AI benefit: "Oh, if AI can't even count the number of letters in a word, then all its answers are incorrect" is simply not true.
Forgeties79|21 days ago
I feel like this is a common refrain that sets an impossible bar for detractors to clear. You can simply hand wave away any critique with “you’re just not using it right.”
If countless people are “using it wrong” then maybe there’s something wrong with the tool.
existencebox|21 days ago
To preempt that on my end, and emphasize I'm not saying "it's useless" so much as "I think there's some truth to what the OP says", as I'm typing this I'm finishing up a 90% LLM coded tool to automate a regular process I have to do for work, and it's been a very successful experience.
From my perspective, a tool (LLMs) has more impact than how you yourself directly use it. We talk a lot about pits of success and pits of failure from a code and product architecture standpoint, and right now, as you acknowledge yourself in the last sentence, there's a big footgun waiting for any dev who turns their head off too hard. In my mind, _this is the hard part_ of engineering; keeping a codebase structured, guardrailed, well constrained, even with many contributors over a long period of time. I do think LLMs make this harder, since they make writing code "cheaper" but not necessarily "safer", which flies in the face of mantras such as "the best line of code is the one you don't need to write." (I do feel the article brushes against this where it nods to trust, growth, and ownership) This is not a hypothetical as well, but something I've already seen in practice in a professional context, and I don't think we've figured out silver bullets for yet.
While I could also gesture at some patterns I've seen where there's a level of semantic complexity these models simply can't handle at the moment, and no matter how well architected you make a codebase after N million lines you WILL be above that threshold, even that is less of a concern in my mind than the former pattern. (And again the article touches on this re: vibe coding having a ceiling, but I think if anything they weaken their argument by limiting it to vibe coding.)
To take a bit of a tangent with this comment though: I have come to agree with a post I saw a few months back, that at this point LLMs have become this cycle's tech-religious-war, and it's very hard to have evenhanded debate in that context, and as a sister post calls out, I also suspect this is where some of the distaste comes from as well.
seanmcdirmid|21 days ago
Vibe coding and slop strawmen are still strawmen. The quality of the debate is obviously a problem
piskov|21 days ago
If only there were things called comments, clean-code, and what have you
isodev|21 days ago
Also, now that StackOverflow is no longer a thing, good luck meaningfully improving those code agents.