top | item 46993479

Ask HN: Are you using an agent orchestrator to write code?

41 points| gusmally | 17 days ago

In a recent interview with The Pragmatic Engineer, Steve Yegge said he feels "sorry for people" who merely "use Cursor, ask it questions sometimes, review its code really carefully, and then check it in."

Instead, he recommends engineers integrate LLMs into their workflow more and more, until they are managing multiple agents at one time. The final level in his AI Coding chart reads: "Level 8: you build your own orchestrator to coordinate more agents."

At my work, this wouldn't fly-- we're still doing things the sorry way. Are you using orchestrators to manage multiple agents at work? Particularly interested in non-greenfield applications and how that's changed your SDLC.

62 comments

Aurornis|16 days ago

> Steve Yegge said he feels "sorry for people" who merely "use Cursor, ask it questions sometimes, review its code really carefully, and then check it in."

Steve Yegge is building a multi-agent orchestration system. This is him trying to FOMO listeners into using his project.

From what I've observed, the people trying to use herds of agents to work on different things at the same time are just using tokens as fast as possible because they think more tokens means more progress. As you scale up the sub-agents you spend so much time managing the herd and trying to backtrack when things go wrong that you would have been better off handling it serially with yourself in the loop.

If you don't have someone else paying the bill for unlimited token usage it's going to be a very expensive experiment.

matkoniecz|16 days ago

Also, Steve Yegge is a swindler and scammer, who benefited financially from pump and dump scheme advertised by him on his blog.

See https://steve-yegge.medium.com/bags-and-the-creator-economy-...

Note that some disclaimers, warnings were added afterwards.

politelemon|16 days ago

Having gone through his interview just now, his advice and experience seems centered around Vibe coding new applications and not really reflective of the reality of the industry.

> But I feel sorry for people who are good engineers – or who used to be – and they use Cursor, ask it questions sometimes, review its code really carefully, and then check it in. And I’m like: ‘dude, you’re going to get fired [because you are not keeping up with modern tools] and you’re one of the best engineers I know!’”

I would certainly take a careful person over the likes of yegge who seems to be neither pragmatic, nor an engineer.

linkregister|16 days ago

Yegge became famous from his blog recounting his hiring as a software engineer at Google in the early 2010s. He has been an engineer for a long time.

However, the implication that someone failing to use an experimental technology is falling behind is hyperbole.

enraged_camel|16 days ago

>> I would certainly take a careful person over the likes of yegge who seems to be neither pragmatic, nor an engineer.

What utter nonsense. Yegge has been a programmer for longer than some people on this board have been alive, has worked on a lot of interesting and massively challenging projects and generously shared what he has learned with the community. Questioning his engineering chops is both laughable and absurd.

esperent|16 days ago

Steve is basically an Instagram influencer for coders.

He'll say whatever he can to stay in the spotlight, try to make you feel bad, that you're doing things wrong, that he invented things like agent orchestration when in fact he's just a loudmouth.

Ignore him and his stupid gastown and get on with your life.

d4rkp4ttern|16 days ago

Attention is all everyone wants.

avaer|16 days ago

I think people who run 15 agents to write a piece of software could probably use 1 or 2 and a better multi-page prompt and have the same results for a fraction of the cost.

Especially with the latest models which pack quite a long and meaningful horizon into a single session, if you prompt diligently for what exactly you want it to do. Modern agentic coding spins up its own sub-agents when it makes sense to parallelize.

It's just not as sexy as typing a sentence and letting your AI bill go BRR (and then talking about it).

I'd like to see some actual results with a meaningful benchmark of software output that shows that agent orchestrators accomplish any meaningful improvement in the state of the art of software engineering, other than spending more tokens.

Maybe it's time to dredge up the Mythical Man-Month?

kyleee|16 days ago

Do the widely held truths of the mythical man month book hold in a universe of well prompted agents?

jolux|16 days ago

No point. Claude Code with skills and subagents is plenty. If they would stop breaking it constantly it would be fine.

The bottleneck has not been how quickly you can generate reasonable code for a good while now. It’s how quickly you can integrate and deploy it and how much operational toil it causes. On any team > 1, that’s going to rely on getting a lot of people to work together effectively too, and it turns out that’s a completely different problem with different solutions.

fooster|16 days ago

What if you could remove that toil.

blakec|6 days ago

Yeah, but not a framework. I'm using Claude Code's hook system. 84 hooks across 15 event types.

Biggest thing I learned: don't let multiple hooks fire independently on the same event. I had seven on UserPromptSubmit, each reading stdin on their own. Two wrote to the same JSON state file. Concurrent writes = truncated JSON = every downstream hook breaks. One dispatcher per event running them sequentially from cached stdin fixed it. 200ms overhead per prompt, which you never notice.

The "multi-agent is worse than serial" take is true when agents share context. Stops being true when you give planning agents their own session (broad context, lots of file reads) and implementation agents their own (narrow task, full window). I didn't plan that separation. It just turned out that mixing both in one session made both worse.

No framework, no runtime. Just files. You can use one hook or eighty-four.

dolebirchwood|16 days ago

I don't know what kind of work he's doing that doesn't require actually reading the code to ensure it's appropriately maintainable, but more power to him. I actually like knowing what the hell my code is doing and that it conforms to my standards before committing it. I'll accept his condolences.

wasmainiac|16 days ago

Same, seems completely irresponsible.

utopiah|16 days ago

We don't have time for safety, or security, or accuracy, or even understandability anymore. We need to move fast! /s

ryandvm|16 days ago

Yegge is smart guy but, I get the impression that he mostly works real hard to not work too hard. And while I'm sympathetic towards that tendency, I feel like most of the advice in this space is about using multi-agent configurations to effectively convince your management and co-workers that you're massively productive. Whether the mountains of code and architecture your agents are crapping out is actually adding value to the organization is a question that most companies are ill-equipped to evaluate - so these agent swarms are just a way to keep the gravy train rolling.

I think I'd rather hear what somebody who is pathologically productive like John Carmack is doing with multi-agent environments...

lubujackson|16 days ago

I think Yegge needs to keep up with the tech a bit more. Cursor has gotten quite powerful - it's plan mode now seems about on par with Claude Code, producing Mermaid charts and detailed multi-phase plans that pretty much just work. I also noticed their debug mode will now come up with several thesises (thesi?), create some sort of debugging harness and logging system, test each thesis, tear down the debugging logic and present a solution. I have no idea when that happened, but it helped solve a tricky frontend race condition for me a day or two ago.

I still like Claude, but man does it suck down tokens.

tbrownaw|16 days ago

Sometimes I tell the AI to change something, sometimes I just do it myself. Sometimes I start to do it and then the magic tab-complete guesses well enough that I can just tab through the rest of it.

Sometimes the magic tab-complete insists on something silly and repeatedly gets in the way.

Sometimes I tell the AI to do something, and then have to back out the whole thing and do it right myself. Sometimes it's only a little wrong, and I can accept the result and then tweak it a bit. Sometimes it's a little wrong in a way that's easy to tell it to fix.

0xbadcafebee|16 days ago

No I'm not, but not because I don't want to. To safely use an AI agent, it needs a ton of safety guardrails that (afaict) are difficult to set up. A lot of the safety guardrails we need don't even exist yet.

I'm working on all that currently. Trying to set up local systems to do practical and secure orchestrated AI work, without over-reliance on proprietary systems and platforms. Turns out it's a buttload of work. Yegge's own project (Gas Town) is a real world attempt to build just the agent part, and still many more parts are needed. It's so complicated, I don't think any open source solution is going to become dominant, because there's too much to integrate. The company that perfects this is going to be the next GitHub and Heroku rolled into one.

I get why people question all this. It's a completely different way of working that flies in the face of every best practice and common-sense lesson you learn as a software developer. But once you wrap your head around it, it makes total sense. You don't need to read code to know a system works and is reliable. You don't need to manually inspect the quality of things if there's other ways to establish trust. Work gets done a lot faster with automation, ironically with fewer errors. You can use cutting-edge technology to improve safety and performance, and ship faster.

These aren't crazy hypothetical ideals - what I just described is modern auto manufacturing. If it's safe enough for a car, it's safe enough for a web app.

hrishikesh-s|16 days ago

Yes, I'm using an agent orchestrator to write code. In fact, a couple of days before Anthropic introduced agent teams, I built a custom tool for myself inside emacs: https://github.com/hrishikeshs/magnus

I basically cycle through prompts and approve/deny/guide agents while looking at the buffer and thinking traces as text scrolls through. It has changed my life :)

KB|15 days ago

Have you considered integrating with worktrees via Magit to have multiple agents working across different changesets of a repo?

mlaretallack|16 days ago

Not the best way to do it, but I use xfce, multiple workspaces, each with there own version of AWS Kiro, and each kiro has its own project I am working on. This allows me to "switch context" easier between each project to check how the agents are getting on. Kiro also notifies me when an agent wants somthing. Usually I keep it to about 4 projects at a time, just to keep the context switching down.

jovanaccount|15 days ago

Why it works: It validates their pain ("herding cats") and offers a specific, technical fix (concurrency control) rather than a vague "AI solution."

The "managing the herd" overhead is real. I found that 80% of my debugging time wasn't fixing bad code, but fixing race conditions where agents were overwriting each other's context or hallucinating because they didn't have the latest state.

I ended up building a "traffic light" protocol (essentially a semaphore for swarms) just to force serialization on critical tasks. It kills the speed slightly but stops the "death spiral" where one agent's error cascades through the herd.

If you're building your own orchestrator or using something like OpenClaw, I open-sourced the concurrency logic here: https://github.com/jovanSAPFIONEER/Network-AI

softwaredoug|15 days ago

This mode of operating seems fairly alien to me.

It’s like a big waterfall design. It’s rare you have all the requirements of an app known up front. It’s pretty rare that they’re known so well you could heads down code non stop and have some result matching a spec.

Usually the coding is iterative and collaborative with other people. You ship something to a customer/colleague. You discuss “is this right!?” You evolve accordingly. It doesn’t matter if you have a perfect coding agent writing 100% of the code - active discovery of what to build IS the job.

Where fully autonomous coding makes sense is when you don’t care and most defaults are fine. In this case you’re working aggressively top down on a problem. Start with the default rails app version of your app, fine tune in small steps what’s custom.

Or your task is heavily verifiable a priori, like a C compiler. Or translating a parser with great tests from language A to B

kasey_junk|14 days ago

I tend to have between 4-15 agent sessions going at once, but importantly I’m counting agents that are awaiting inputs from me.

My agent orchestration system is a bespoke python program that I vibed just for me. It is one of thousands of systems that combines git worktrees and devcontainers. But I’ve customized it for my quirks and workflows. The big win is I can decide on a repo by repo basis what level of permissions to give an agent from yolo mode to very limited permissions.

In that agent count there are usually 3-5 sessions that are my main tasks mixed between research, planning,coding and code review. The balance of the sessions are sessions for other tasks, improving tests, adding new kinds of guard rails, projects that are ancillary etc.

johnfn|16 days ago

I am unfortunately in level 8. God help me. But honestly building an agent orchestrator is a really fun problem. It's like building an IDE and then using that IDE to build itself. Or building a programming language and then coding in that language! But with an entirely new host of different and interesting problems.

burnerToBetOut|16 days ago

    > …The final level in his AI
    > Coding chart reads: "Level 8:
    > you build your own orchestrator
    > to coordinate more agents"…

Yegge revealed [1] what's necessary to get to that level…

____

…How do you avoid getting tired? Dude, I take naps throughout the day. I'm exhausted…

…

…which is why I mentioned in one of my last blog posts that I'm taking naps all the time…

____

Yegge's productivity sounds impressive. I'll give you that. But it doesn't sound practical or sustainable for the everyday dev.

I doubt that even Google — with all its famous perks — offers employees ad hoc nap times while they're on the clock.

[1] https://g2ww.short.gy/Napster2026

the_harpia_io|12 days ago

the thing nobody seems to talk about with multi-agent orchestration is what happens to code review. like if I'm barely keeping up reviewing what one agent produces - and I'm pretty careful about it - how exactly am I supposed to review output from 3-4 running in parallel?

I've been spending a lot of time lately looking at security issues in AI-generated code specifically and the patterns are wild. the agents don't just make random mistakes, they have consistent blind spots - auth flows, input validation, race conditions in async stuff. and these aren't the kind of bugs that show up in a demo or even in basic tests.

at my work we tried letting two agents work on different parts of the same service for about a week. the code each produced was fine individually but the integration points were a mess - inconsistent error handling, one agent assumed the other's API would validate inputs. classic stuff that a human writing both sides would catch instinctively.

honestly I think the people pushing level 8 orchestration are optimizing for lines of code produced per hour which is maybe the least useful metric in software engineering

tiku|16 days ago

When orchestrating you need to have a damn good plan / requirements. And then I'm typing or thinking a lot beforehand. And at the end it's never 100% what you want.

That is why I'm going back to per function/small scope ai questions.

nprateem|16 days ago

There's important stuff to review, 10-20% (eg overall architecture, use of existing utilities/patterns), and there's the specifics of the client code.

My reviews pick out the first and gloss over the latter. They take a few minutes. So I run multiple distinct tasks across agents in antigravity, so there's less chance of conflict. This is on 500k+ line codebase. I'm amazed by the complexity of changes it can handle.

But I agree with his take. Old fashioned programming is dead. Now I do the work of a team of 3 or 4 people each day: AI speed but also no meetings, no discussions, no friction.

_sinelaw_|16 days ago

I did when just starting on a new project, it was working well when I had many new components to implement. But as the project matured and stabilized every new feature is cross-cutting and it's impossible to parallelize the work without running into conflicts (design conflicts, where two agents would add similar overlapping mechanisms, and also the usual code conflicts, touching the same files). Also, with project maturity I'm much more concerned about keeping it stable and correct, which is hard to do with parallel agents running amok.

johnfn|16 days ago

I find if you just ask the agents to resolve the conflicts they do a pretty great job. It's even better if you can feed them all the context while resolving the conflict.

0xecro1|16 days ago

I use it for both — side projects and my day job in embedded systems.

The key is where the tokens go. More tokens spent on planning, design, spec validation, test generation, and multi-agent review than on writing the actual code. The review pipeline should be heavier than the generation pipeline.

I encourage my team to use it as a plugin too. The "sorry way" is still a fine starting point — but once you see what a structured agent pipeline catches that manual review misses, it's hard to go back.

adakuchi2242|13 days ago

Founder here.

In the last six months I’ve been heads-down building ORA—an autonomous super agent that represents the next step toward AGI. I basically use it for every piece of work right now, including `to write code`.

Demo videos are now up on the @OscerraHQ X account. Your feedback would be invaluable as we work to perfect the product before launch.

petesergeant|16 days ago

"Claude writes, Codex reviews" has shown huge promise as a pattern for me, so I wrote a Dockerfile and some instructions on how to make that happen for agents, and ended up with https://github.com/pjlsergeant/moarcode

I am spending most of my day in this harness. It has rough edges for sure, but it means I trust the code coming out much more than I did just Claude.

joshuaisaact|16 days ago

I don't think you need two separate models for this - I get similarly good results re-prompting with Claude. Well, not re-prompting, I just have a skill that wipes the context then gets Claude to review the current PR and make improvements before I review it.

neumann|16 days ago

I tried to opposite because claude was not coding as well as codex some additional modules for my codebase and codex could. Then I tried to get claude to read and critique and it got so many fundamentals wrong I was wondering if I am using the wrong model.

allinonetools_|14 days ago

We experimented with chaining agents, but honestly the overhead of coordinating them often outweighed the gains. For non-greenfield codebases, tight feedback loops with one strong assistant plus solid review still worked better for us. Orchestration makes sense, but only when the surrounding tooling and tests are mature enough to absorb the output.

writingdna|7 days ago

The Yegge claim about manual code review being outdated conflates two things: reviewing for correctness vs. reviewing for design coherence. Agents are getting decent at the first (does this function do what the spec says?) but remain weak at the second (does this abstraction fit the existing architecture? will this pattern scale when the next feature lands?).

What actually works for me is treating agents less like autonomous developers and more like very fast typists who need clear architectural guardrails. The heavy lifting is writing the context documents -- architecture decision records, module boundary descriptions, naming conventions -- that constrain the generation. Ironically, the better your documentation, the less you need an orchestrator, because a single agent with good context produces coherent code on the first pass.

The git worktree pattern multiple people mention is underrated. Having each agent work on an isolated branch with automated test gates before merge catches the drift problem at the integration point rather than trying to prevent it during generation.

dsifry|16 days ago

I have been helping people get onboarded with Claude Code and the orchestrator I wrote called Metaswarm [1) and the response has been way beyond my expectations.

But don't take my word for it, try it out for yourself, it is MIT licensed, and you can create new projects with it or add it to an existing project.

[1] https://github.com/dsifry/metaswarm

andy_ppp|16 days ago

I think people should figure out what works for them rather than letting people on the internet gate-keep what is good. Everything is about personal choices and refining your own taste. I would not be happy being unable to understand everything deeply so having a million agents all doing stuff would just cause me a load of stress even if I could churn stuff out more quickly.

d4rkp4ttern|16 days ago

I think there’s a level beyond 8: not reviewing AI-generated code.

There’s a lot of discussion about whether to let AI write most of your code (which at least in some circles is largely settled by now), but when I see hype-posts about “AI is writing almost all of our code”, the top question I’m curious about is, how much of the AI-written code are they reviewing ?

Glyptodon|16 days ago

The stumbling block we have is spinning up separate environments for every agent so they have isolation for their branches. I think this is solveable, but we aren't trying to solve it ourselves. In practice it means we aren't doing a lot of agent supervision.

SkyPuncher|16 days ago

Git worktrees essentially solve this. It essentially copies your repo to a new folder

tbrownaw|16 days ago

That sounds like an excellent match for containers.

slopinthebag|16 days ago

No, I don't even use agents to generate code most of the time. I mainly use the inline assistant to modify or fill out blocks of code, and agents sometimes for refactors, asking questions, search, debugging, generating documentation etc.

I feel bad for Yegge.

wasmainiac|16 days ago

> At my work, this wouldn't fly

How does one even review the code from multiple agents. The quality imo is still to low to just let run on its own.

dboreham|16 days ago

People lie. Let's see a video of them doing this, or logs of the sessions, and the generated code, so we can judge for ourselves.

woutr_be|16 days ago

I would love to experience this, but I'm only at the level were I occasionally open ChatGPT or Claude, asked it a question, and then get frustrated because it can't even give me a straight answer, or makes incorrect assumptions.

I can't even imagine having multiple agents write code that somehow works.

freakynit|16 days ago

Same here. I've tried agent integrations in VS Code and also have agentic CLIs installed (Claude Code, Gemini cli). But honestly, I still find it more reliable, and often faster, to just ask focused questions, let it generate a method or two, review the output, and copy-paste it into my project. Rinse and repeat. Kind of like how we used to do in the good old days an year back.

For now at least, the full agent workflows feel less efficient and more headache-inducing than being helpful.

And agentic swarms: that's marketing bs.. at least for now.

lmeyerov|16 days ago

I stopped manually writing code 6-9mo ago, and am generating high-quality code on the dimensions we care about like GPU perf benchmarks, internal & industry conformance standards test suites, evals benchmarks, lint/type checkers, etc. It's not perfect code - there are clear AI slop tell tales that review cycles still let linger - but it's doing more ambitious things than we'd do on most dimensions like capability, quality, and volume. We're solving years-old GPU bugs that we had given up on as mere mortals.

And yes, we build our own orchestrator tech, both as our product (not vibes coding but vibes investigating), and more relevant here, our internal tooling. For example, otel & evals increasingly drive our AI coding loops rather than people. Codex and claude code are great agentic coding harnesses, so our 'custom orchestration' work is more about more intelligently using them in richer pipelines, like the above eval-driven loop. They've been pretty steadily adding features like parallel subagents that work in teams, and hookable enough to do most tricks, that I don't feel the need to use others. We're busy enough adapting on our own!

bitwize|16 days ago

We're not there yet, but it's going to happen. Given the nature of the application I'm working on, I wouldn't be surprised if the entire headcount of the engineering department were reduced to around five or so in a year or two.

pdyc|16 days ago

i tried but it didn't worked for me. Now i use agents as editors for fully formed solution so slightly better editor than typing.

whattheheckheck|16 days ago

Vscode agent mode is pretty slick

gimmeslop|16 days ago

I couldn’t ship 1.5 million lines of code daily without orchestrated agents.

Lapsa|15 days ago

eshaham78|16 days ago

[deleted]

Sea_reafused|14 days ago

[deleted]

leej111|16 days ago

[deleted]