top | item 46747660

(no title)

mafriese | 1 month ago

Ok it might sound crazy but I actually got the best quality of code (completely ignoring that the cost is likely 10x more) by having a full “project team” using opencode with multiple sub agents which are all managed by a single Opus instance. I gave them the task to port a legacy Java server to C# .NET 10. 9 agents, 7-stage Kanban with isolated Git Worktrees.

Manager (Claude Opus 4.5): Global event loop that wakes up specific agents based on folder (Kanban) state.

Product Owner (Claude Opus 4.5): Strategy. Cuts scope creep

Scrum Master (Opus 4.5): Prioritizes backlog and assigns tickets to technical agents.

Architect (Sonnet 4.5): Design only. Writes specs/interfaces, never implementation.

Archaeologist (Grok-Free): Lazy-loaded. Only reads legacy Java decompilation when Architect hits a doc gap.

CAB (Opus 4.5): The Bouncer. Rejects features at Design phase (Gate 1) and Code phase (Gate 2).

Dev Pair (Sonnet 4.5 + Haiku 4.5): AD-TDD loop. Junior (Haiku) writes failing NUnit tests; Senior (Sonnet) fixes them.

Librarian (Gemini 2.5): Maintains "As-Built" docs and triggers sprint retrospectives.

You might ask yourself the question “isn’t this extremely unnecessary?” and the answer is most likely _yes_. But I never had this much fun watching AI agents at work (especially when CAB rejects implementations). This was an early version of the process that the AI agents are following (I didn’t update it since it was only for me anyway): https://imgur.com/a/rdEBU5I

discuss

alphazard|1 month ago

Every time I read something like this, it strikes me as an attempt to convince people that various people-management memes are still going to be relevant moving forward. Or even that they currently work when used on humans today. The reality is these roles don't even work in human organizations today. Classic "job_description == bottom_of_funnel_competency" fallacy.

If they make the LLMs more productive, it is probably explained by a less complicated phenomenon that has nothing to do with the names of the roles, or their descriptions. Adversarial techniques work well for ensuring quality, parallelism is obviously useful, important decisions should be made by stronger models, and using the weakest model for the job helps keep costs down.

rlayton2|1 month ago

My understanding is that the main reason splitting up work is effective is context management.

For instance, if an agent only has to be concerned with one task, its context can be massively reduced. Further, the next agent can just be told the outcome, it also has reduced context load, because it doesn't need to do the inner workings, just know what the result is.

For instance, a security testing agent just needs to review code against a set of security rules, and then list the problems. The next agent then just gets a list of problems to fix, without needing a full history of working it out.

simondotau|1 month ago

I suppose it’s could end up being an LLM variant of Conway’s Law.

“Organizations are constrained to produce designs which are copies of the communication structures of these organizations.”

https://en.wikipedia.org/wiki/Conway%27s_law

miki123211|1 month ago

I think it's just the opposite, as LLMs feed on human language. "You are a scrum master." Automatically encodes most of what the LLM needs to know. Trying to describe the same role in a prompt would be a lot more difficult.

Maybe a different separation of roles would be more efficient in theory, but an LLM understands "you are a scrum master" from the get go, while "you are a zhydgry bhnklorts" needs explanation.

ttoinou|1 month ago

Developers do want managers actually, to simplify their daily lives. Otherwise they would self manage themselves better and keep more of the share of revenues for them

ljm|1 month ago

It shows me that there doesn’t appear to be an escape from Conway’s Law, even when you replace the people in an organisation with machines. Fundamentally, the problem is still being explored from the perspective of an organisation of people and it follows what we’ve experienced to work well (or as well as we can manage).

generallyjosh|1 month ago

I do think there is some actual value in telling an LLM "you are an expert code reviewer". You really do tend to get better results in the output

When you think about what an LLM is, it makes more sense. It causes a strong activation for neorons related to "code review", and so the model's output sounds more like a code review.

zhenyakovalyov|1 month ago

i guess, as a human it’s easier to reason about a multi-agent system when the roles are split intuitively, as we all have mental models. but i agree - it’s a bit redundant/unnecessary

AlexErrant|1 month ago

For those ignorant, CAB is Change-advisory board

https://en.wikipedia.org/wiki/Change-advisory_board

rafaelmdec|1 month ago

Thank you for the link and the compliment.

sathish316|1 month ago

Subagent orchestration without the overhead of frameworks like Gastown is genuinely exciting to see. I’ve recorded several long-running demos of Pied-Piper, which is a Subagents orchestration system for Claude Code and ClaudeCodeRouter+OpenRouter here: https://youtube.com/playlist?list=PLKWJ03cHcPr3OWiSBDghzh62A...

I came across a concept called DreamTeam, where someone was manually coordinating GPT 5.2 Max for planning, Opus 4.5 for coding, and Gemini Pro 3 for security and performance reviews. Interesting approach, but clearly not scalable without orchestration. In parallel, I was trying to do repeatable workflows like API migration, Language migration, Tech stack migration using Coding agents.

Pied-Piper is a subagent orchestration system built to solve these problems and enable repeatable SDLC workflows. It runs from a single Claude Code session, using an orchestrator plus multiple agents that hand off tasks to each other as part of a defined workflow called Playbooks: https://github.com/sathish316/pied-piper

Playbooks allow you to model both standard SDLC pipelines (Plan → Code → Review → Security Review → Merge) and more complex flows like language migration or tech stack migration (Problem Breakdown → Plan → Migrate → Integration Test → Tech Stack Expert Review → Code Review → Merge).

Ideally, it will require minimal changes once Claude Swarm and Claude Tasks become mainstream.

vercaemert|1 month ago

Personally, I'm fascinated by the opening for protocol languages to become relevant.

The previous generations of AI (AI in the academic sense) like JASON, when combined with a protocol language like BSPL, seems like the easiest way to organize agent armies in ways that "guarantee" specific outcomes.

The example above is very cool, but I'm not sure how flexible it would be (and there's the obvious cost concern). But, then again, I may be going far down the overengineering route.

juanre|1 month ago

I have been using a simpler version of this pattern, with a coordinator and several more or less specialized agents (eg, backend, frontend, db expert). It really works, but I think that the key is the coordinator. It decreases my cognitive load, and generally manages to keep track of what everyone is doing.

big-guy23|1 month ago

Share your code of the “actual best quality “ or this is just another meaningless and suspicious attempt to get users to put the already expensive AI in a for-loop to make it even more expensive

kaspermarstal|1 month ago

Can you share technical details please? How is this implemented? Is it pure prompt-based, plugins, or do you have like script that repeatedly calls the agents? Where does the kanban live?

mogili1|1 month ago

Not the OP, but this is how I manage my coding agent loops:

I built a drag and drop UI tool that sets up a sequence of agent steps (Claude code or codex) and have created different workflows based on the task. I'll kick them off and monitor.

Here's the tool I built for myself for this: https://github.com/smogili1/circuit

taspeotis|1 month ago

This sounds like BMAD?

https://github.com/bmad-code-org/BMAD-METHOD

paulnovacovici|1 month ago

I’ve been messing around with the BMAD process as well which seems like a simpler workflow than you described. My only concern is that it’s able to get 90% of the way there for productionized ready code, but the last 10% is starts to fail at when the tech debt gets too large.

Have you been able to build anything productionizable this way, or are you just using this workflow for rapid prototyping?

JasperBekkers|1 month ago

This is genuinely cool, the CAB rejecting implementations must be hilarious to watch in action. The Kanban + Git worktree isolation is smart for keeping agents from stepping on each other.

I've been working on something in this space too. I built https://sonars.dev specifically for orchestrating multiple Claude Code agents working in parallel on the same codebase. Each agent gets its own workspace/worktree and there's a shared context layer so they can ask each other questions about what's happening elsewhere (kind of like your Librarian role but real-time).

The "ask the architect" pattern you described is actually built into our MCP tooling: any agent can query a summary of what other agents have done/learned without needing to parse their full context.

DanOpcode|1 month ago

Very cool! A couple of questions:

1. Are you using a Claude Code subscription? Or are you using the Claude API? I'm a bit scared to use the subscription in OpenCode due to Anthropic's ToS change.

2. How did you choose what models to use in the different agents? Do you believe or know they are better for certain tasks?

porker|1 month ago

> due to Anthropic's ToS change.

Not a change, but enforcing terms that have been there all the time.

ComplexSystems|1 month ago

How much does this setup cost? I don't think a regular Claude Max subscription makes this possible.

amelius|1 month ago

Can't you just use time-sharing and let the entire task run over night?

potamic|1 month ago

Could you share some details? How many lines of code? How much time did it take, and how much did it cost?

karmasimida|1 month ago

You might as well just have planner and workers, or your architecture essentially echos to such structure. It is difficult to discern how semantics can drive to different behavior amongst those roles, and why planner can't create those prompts the ad-hoc way.

alexwrboulter|1 month ago

This now makes me think that the only way to get AI to work well enough to actually actually replace programmers will probably be paying so much for compute that it's less expensive to just have a junior dev instead.

RestartKernel|1 month ago

What are the costs looking like to run this? I wonder whether you would be able to use this approach within a mixture-of-experts model trained end-to-end in ensemble. That might take out some guesswork insofar the roles go.

fortedoesnthack|1 month ago

I was getting good results with a similar flow but was using claude max with ChatGPT. unfortunately not an option available to me anymore unless either I or my company wants to foot the bill.

ceroxylon|1 month ago

What are you building with the code you are generating?

_alex_|1 month ago

Interesting that your impl agents are not opus. I guess having the more rigorous spec pipeline helps scope it to something sonnet can knock out.

tommica|1 month ago

Is it just multiple opencode instances inside tmux panels or how do you run your setup?

5Qn8mNbc2FNCiVV|1 month ago

Do you mind sharing the prompts? Would be greatly appreciated

ggoo|1 month ago

Is this satire?

mafriese|1 month ago

Nope it isn’t. I did it as a joke initially (I also had a version where every 2 stories there was a meeting and if a someone underperformed it would get fired). I think there are multiple reasons why it actually works so well:

- I built a system where context (+ the current state + goal) is properly structured and coding agents only get the information they actually need and nothing more. You wouldn’t let your product manager develop your backend and I gave the backend dev only do the things it is supposed to and nothing more. If an agent crashes (or quota limits are reached), the agents can continue exactly where the other agents left off.

- Agents are ”fighting against” each other to some extend? The Architect tries to design while the CAB tries to reject.

- Granular control. I wouldn’t call “the manager” _a deterministic state machine that is calling probabilistic functions_ but that’s to some extent what it is? The manager has clearly defined tasks (like “if file is in 01_design —> Call Architect)

Here’s one example of an agent log after a feature has been implemented from one of the older codebases: https://pastebin.com/7ySJL5Rg

GoatInGrey|1 month ago

It's not satire but I see where you're coming from.

Applying distributed human team concepts to a porting task squeezes extra performance from LLMs much further up the diminishing returns curve. That matters because porting projects are actually well-suited for autonomous agents: existing code provides context, objective criteria catch more LLM-grade bugs than greenfield work, and established unit tests offer clear targets.

I guess what I'm trying to say is that the setup seems absurd because it is. Though it also carries real utility for this specific use case. Apply the same approach to running a startup or writing a paid service from scratch and you'd get very different results.

SkyPuncher|1 month ago

Doubt it. I use a similar setup from time to time.

You need to have different skills at different times. This type of setup helps break those skills out.

hereme888|1 month ago

why would it be? It's a creative setup.

thaynt|1 month ago

I think many people really like the gamification and complex role playing. That is how GitHub got popular, that is how Rube Goldberg agent/swarm/cult setups get popular.

It attracts the gamers and LARPers. Unfortunately, management is on their side until they find out after four years or so that it is all a scam.

theonething|1 month ago

I don't think so.

tehlike|1 month ago

You probably implemented gastown.

raffraffraff|1 month ago

The next stage in all of this shit is to turn what you have into a service. What's the phrase? I don't want to talk to the monkey, I want to talk to the organ grinder. So when you kick things off it should be a tough interview with the manager and program manager. Once they're on board and know what you want, they start cracking. Then they just call you in to give demos and updates. Lol

justmedep|1 month ago

Scrum masters typically do not assign tickets.

heliumtera|1 month ago

Congratulations on coming up with the cringiest thing I have ever seen. Nothing will top this, ever.

Corporate has to die