top | item 47022485

(no title)

jddj | 14 days ago

> Don’t spec the process, spec the outcome.

For this, which summarises vibe coding and hence the rest of the article, the models aren't good enough yet for novel applications.

With current models and assuming your engineers are of a reasonable level of experience, for now it seems to result in either greatly reduced velocity and higher costs, or worse outcomes.

One course correction in terms of planned process, because the model missed an obvious implication or statement, can save days of churning.

The math only really has a chance to work if you reduce your spend on in-house talent to compensate, and your product sits on a well-trodden path.

In terms of capability we're still at "could you easily outsource this particular project, low touch, to your typical software farm?"

discuss

order

post_below|14 days ago

You can expand it beyond novel applications. The models aren't good enough for autonomous coding without a human in the loop period.

They can one shot basic changes and refactors, or even many full prototypes, but for pretty much everything else they're going to start making mistakes at some point. Usually very quickly. It's just where the technology is right now.

The thing that frustrates me is that this is really easy to demonstrate. Articles like this are essentially hallucinations that, at least many, people mystifyingly take seriously.

I assume the reason they get any traction is that a lot of people don't have enough experience with LLM agents yet to be confident that their personal experience generalizes. So they think maybe there are magical context tricks to get the current generation of agents to not make the kinds of mistakes they're seeing.

There aren't. It doesn't matter if it's Opus 4.6 in Claude Code or Codex 5.3 xhigh, they still hallucinate, fail to comprehend context and otherwise drift.

Anyone who can read code can fire up an instance and see this for themselves. Or you can prove it for free by looking at the code of any app that the author says was vibecoded without human review. You won't have to look very hard.

Agents can accomplish impressive things but also, often enough, they make incomprehensibly bad decisions or make things up. It's baked into the technology. We might figure out how to solve that problem eventually, but we haven't yet.

You can iterate, add more context to AGENTS.md or CLAUDE.md, add skills, setup hooks, and no matter how many times you do it the agents will still make mistakes. You can make specialized code review agents and run them in parallel, you can have competing models do audits, you can do dozens of passes and spend all the tokens you want, if it's a non trivial amount of code, doing non trivial things, and there's no human in the loop, there will still be critical mistakes.

No one has demonstrated different behavior, articles and posts claiming otherwise never attempt to prove that what they claim is actually possible. Because it isn't.

Just to be clear, I think coding agents are incredibly useful tools and I use them extensively. But you can't currently use them to write production code without a human in the loop. If you're not reading and understanding the code, you're going to be shipping vulnerabilities and tech debt.

Articles like this are just hype. But as long as they keep making front pages they'll keep distorting the conversation. And it's an otherwise interesting conversation! We're living through an unprecented paradigm shift, the field of possibilities is vast and there's a lot to figure out. The idea of autonomous coding agents is just a distraction from that, at least for now.