I am sceptical if these persona based agents really make that much of a difference, and more "appear" to make a difference because of their talk style.
Underneath is just a system prompt, or more likely a prompt layered on top "You are a frontend engineer, competent in react and Next.js, tailwind-css" - the stack details and project layout, key information is already in the CLAUDE.md. For more stuff the model is going to call file-read tools etc.
I think its more theatre then utilty.
What I have taken to doing is having a parent folder and then frontend/ backend/ infra/ etc as children.
The parent/CLAUDE.md provides a highlevel view of the stack "FastAPI backend with postgres, Next.js frontend using with tailwind, etc". The parent/CLAUDE.md also points to the childrens CLAUDE.md's which have more granular information.
I then just spawn a claude in the parent folder, set up plan mode, go back and forth on a design and then have it dump out to markdown to RFC/ and after that go to work. I find it does really well then as all changes it makes are made with a context of the other service.
I'm also skeptical partially because I don't like the huge essays generated by any llm. CLAUDE.md/AGENTS.md/README.md that are 5+ pages long are all equally bad imo. I prefer following the idea that if something is too verbose for me to want to get anything useful out of it, then the llm should behave similarly. Even if it's not true, why waste 2 paragraphs explaining something that could be explained in one short sentence?
My CLAUDE.md or AGENTS.md is usually just a bulleted list of reminders with high level information. If the agent needs more steering, I add more reminders. I try not to give it _too_ broad of a task without prior planning or it'll just go off the rails.
Something I haven't really experimented with is having claude generate ADRs [1] like your RFC/ idea. I'll probably try that and see how it goes.
I found a study a while ago that measured the effect of defining personas, and the effect was significant, but not very big. I like defining roles because I think it makes setting boundaries a bit easier. When I assign the role of architect for brainstorming, I expect the model to be a bit less eager to immediately jump into implementation. I'll still tell it explicitly to not do that, so the effect is probably extremely small.
So far, I find it much more important to define task scope and boundaries. If I want to implement a non-trivial feature, I'll have one role for analyzing the problem and coming up with a high-level plan, and then another role for breaking that plan down into very small atomic steps. I'll then pass each step to an implementation role and give it both the high-level plan and the whole list of individual steps as context, while making it clear that the scope is only to implement that one specific step.
I've had very good results with this so far, and once the two main documents are done, I can automate this with a small orchestration script (that does not depend on an LLM and is completely deterministic) going through the list and passing each item to an implementation agent sequentially, even letting the agent create a commit message after every step so I can trace its work afterwards. I've had very clean long-running tasks this way with minimal need for fixing things afterwards. I can go to bed in the evening and launch it and wake up to a long list of commits.
With the new 6 dollar subscription by Z.ai which includes 120 prompts (around 2000 requests) every 5 hours, I can pretty much let this run without having to worry about exceeding my limits.
I too am skeptical about the personas, but I still use them to organize context and instructions for different types of work. I use a top-level .agents dir, with commands, roles, and rules, sub-dirs.
CLAUDE.md is kept somewhat lean, with pointers to individual files in ./docs/ and .claude/commands is a symlink to .agents/commands.
After starting Claude, I use /commands to load a role and context, which pulls in only the necessary docs and avoids, say, loading UI design or test architecture docs, when adding a backend feature.
I don't want to have to do any of this, but it helps me try and keep the agents on the rails and minimize context rot.
Ive been struggling to get frontend/backend aware claude code working in a way I like.
So are you saying you plan it out with the parent one, and then have it output a file in some format that you then pass to the frontend and backend claude’s individually? but those “sub claudes” don’t actually have the other repo’s context?
As someone who's built a project in this space, this is incredibly unreliable. Subagents don't get a full system prompt (including stuff like CLAUDE.md directions) so they are flying very blind in your projects, and as such will tend to get derailed by their lack of knowledge of a project and veer into mock solutions and "let me just make a simpler solution that demonstrates X."
I advise people to only use subagents for stuff that is very compartmentalized because they're hard to monitor and prone to failure with complex codebases where agents live and die by project knowledge curated in files like CLAUDE.md. If your main Claude instance doesn't give a good handoff to a subagent, or a subagent doesn't give a good handback to the main Claude, shit will go sideways fast.
Also, don't lean on agents for refactoring. Their ability to refactor a codebase goes in the toilet pretty quickly.
> Their ability to refactor a codebase goes in the toilet pretty quickly.
Very much this. I tried to get Claude to move some code from one file to another. Some of the code went missing. Some of it was modified along the way.
Humans have strategies for refactoring, e.g. "I'm going to start from the top of the file and Cut code that needs to be moved and Paste it in the new location". LLM don't have a clipboard (yet!) so they can't do this.
Claude can only reliably do this refactoring if it can keep the start and end files in context. This was a large file, so it got lost. Even then it needs direct supervision.
My experience so far, after trying to keep CC on track with different strategies is that it will more or less end up on the same ditch sooner or later. Even though i had defined agents, workflows, etc. now i just let it interact with github issues and the quality is pretty much the same
Totally agreed, tried agents for a lot of stuff (I started creating a team of agents, architect, frontend coder, backend coder and QA). Spent around 50 USD on a failed project, context contaminated and the project eventually had to be re-written.
Then I moved some parts in rules, some parts in slash commands and then I got much better results.
The subagents are like a freelance contractors (I know, I have been one very recently) Good when they need little handoff (Not possible in realtime), little overseeing and their results are a good advice not an action. They don't know what you are doing, they don't care what you do with the info they produce. They just do the work for you while you do something else, or wait for them to produce independent results. They come and go with little knowledge of existing functionalities, but good on their own.
Here are 3 agents I still keep and one I am working on.
1: Scaffolding: Now I create (and sometimes destroy) a lot of new projects. I use a scaffolding agents when I am trying something new. They start with fresh one line instruction to what to scaffold (e.g. a New docker container with Hono and Postgres connection, or a new cloudflare worker which will connect to R2, D1 and AI Gateway, or a AWS Serverless API Gateway with SQS that does this that and that), where to deploy. At the end of the day they setup the project with structure, create a Github Repo and commit it for me. I will take it forward from them
2: Triage: When I face some issues which is not obvious from reading code alone, I give them the place, some logs and the agent will use whatever available (including the DB Data) to make a best guess of why this issue happens. I often found out they work best when they are not biased by recent work
3: Pre-Release Check QA: Now this QA will test the entire system (Essentially calling all integration and end-to-end test suite to make sure this product doesn't break anything existing. Now I am adding a functionality to let them see original business requirement and see if the code satisfies it or not. I want this agent to be my advisor to help me decide if something goes to release pipeline or not.
4: Web search (Experimental) Sometimes, some search are too costly for existing token, and we only need the end result, not what they search and those 10 pages it found out...
I often see people making these sub agents modelled on roles like product manager, back end developer, etc.
I spent a few hours trying stuff like this and the results were pretty bad compared to just using CC with no agent specific instructions.
Maybe I needed to push through and find a combination that works but I don't find this article convincing as the author basically says "it works" without showing examples or comparing doing the same project with and without subagents.
Anyone got anything more convincing to suggest it's worth me putting more time into building out flows like this instead of just using a generic agent for everything?
Right - don’t make subagents for the different roles, make them to manage context for token heavy tasks.
A backend developer subagent is going to do the job ok, but then the supervisor agent will be missing useful context about what’s been done and will go off the rails.
The ideal sub agent is one that can take a simple question, use up massive amounts of tokens answering it, and then return a simple answer, dropping all those intermediate tokens as unnecessary.
Documentation Search is a good one - does X library have a Y function - the subagent can search the web, read doc MCPs, and then return a simple answer without the supervisor needing to be polluted with all the context
I see lots of people saying you should be doing it, but not actually doing it themselves.
Or at least, not showing full examples of exactly how to handle it when it starts to fail or scale, because obviously when you dont have anything, having a bunch of agents doing any random shit works fine.
I think it's both and. Role based works well in some cases. Task based well in others. It's a false choice to think you have to pick one or the other all the time.
That sounds crazy to me, Claude Code has so many limitations.
Last week I asked Claude Code to set up a Next.js project with internationalization. It tried to install a third party library instead of using the internationalization method recommended for the latest version of Next.js (using Next's middleware) and could not produce of functional version of the boilerplate site.
There are some specific cases where agentic AI does help me but I can't picture an agent running unchecked effectively in its current state.
I have seen it doing incredible stuff. One shotted adding a feature that included modifications to a proprietary backoffice system, db schema updates, defining new api models, implementing changes on the backend and then on the frontend.
I've also seen seen it choking when tasked to add a simple result count on a search.
This is where prompting comes in. You need to remember to tell it about which libs you want or encourage it to web search to find the latest ones, or use something like context7 MCP to get the latest versions.
Claude is always a little behind latest versions because of knowledge cutoff. Also I know the i18n lib you're talking about and it was probably the right call.
This type of posts has nothing to do with real world applications.
With all due respect to the .agents/ markdown files, Claude code often, like other LLMs, get fixed on a certain narrative, and no matter what the instructions are, it repeats that wrong choice over and over and over again, while “apologizing”…
Anything beyond a close and intimate review of its implementation is doomed to fail.
What made things a bit better recently was setting Gemini cli and Claude code taking turns in designing reviewing, implementing and testing each other.
I'm commenting while agents run in project trying to achieve something similar to this.
I feel like "we all" are trying to do something similar, in different ways, and in a fast moving space (i use claude code and didn't even know subagents were a thing).
My gut feeling from past experiences is that we have git, but now git-flow, yet: a standardized approach that is simple to learn and implement across teams.
Once (if?) someone will just "get it right", and has a reliable way to break this down do the point that engineer(s) can efficiently review specs and code against expectations, it'll be the moment where being a coder will have a different meaning, at large.
So far, all projects i've seen end up building "frameworks" to match each person internal workflow. That's great and can be very effective for the single person (it is for me), but unless that can be shared across teams, throughput will still be limited (when compared that of a team of engs, with the same tools).
Also, refactoring a project to fully leverage AI workflows might be inefficient, if compared to a rebuild from scratch to implement that from zero, since building docs for context in pair with development cannot be backported: it's likely already lost in time, and accrued as technical debt.
Is it a good idea to generate more code faster to solve problems? Can I solve problems without generating code?
If code is a liability and the best part is no part, what about leveraging Markdown files only?
The last programs I created were just CLI agents with Markdown files and MCP servers(some code here but very little).
The feedback loop is much faster, allowing me to understand what I want after experiencing it, and self-correction is super fast. Plus, you don't get lost in the implementation noise.
Code you didn't write is an even bigger liability, because if the AI gets off track and you can't guide it back, you may have to spend the time to learn it's code and fix the bugs.
It's no different to inheriting a legacy application though. As well, from the perspective of a product owner, it's not a new risk.
TBH I think the time it takes the agent to code is best spent thinking about the problem. This is where I see the real value of LLMs. They can free you up to think more about architecture and high level concepts.
Fast decision-making is terrible for software development. You can't make good decisions unless you have a complete understanding of all reasonable alternatives. There's no way that someone who is juggling 4 LLMs at the same time has the capacity to consider all reasonable alternatives when they make technical decisions.
IMO, considering all reasonable alternatives (and especially identifying the optimal approach) is a creative process, not a calculation. Creative processes cannot be rushed. People who rush into technical decisions tend to go for naive solutions; they don't give themselves the space to have real lightbulb moments.
Deep focus is good but great ideas arise out of synthesis. When I feel like I finally understand a problem deeply, I like to sleep on it.
One of my greatest pleasures is going to bed with a problem running through my head and then waking up with a simple, creative solution which saves you a ton of work.
I hate work. Work sucks. I try to minimize the amount of time I spend working; the best way to achieve that is by staring into space.
I've solved complex problems in a few days with a couple of thousand lines of code which took some other developers, more intelligent than myself, months and 20K+ lines of code to solve.
Fun little story I recently had using Subagents in Claude Code:
I was working on a large-ish R analysis. In R, people generally start with loading entire libraries like
library(a)
library(b)
etc., leading to namespace clashes. It's better practice to replace all calls to package-functions with package namespaces, i.e., it's better to do
a::function_a()
b::function_b()
than to load both libraries and then blindly trusting that function_a() and function_b() come from a and b.
I asked Claude Code to take a >1000 LOC R script and replace all function calls with their model-namespace function call. It ran one subagent to look for function calls, identified >40 packages, and then started one subagent per package call for >40 subagents. Cost-wise (and speed-wise!) it was mayhem as every subagent re-read the script. It was far faster and cheaper, but a bit harder to judge, to just copy paste the R script into regular Claude and ask it to carry out the same action. The lesson is that subagents are often costly overkill.
I built this tool https://github.com/btree1970/variant-ui where you can use a sub-agent to spin up multiple branches with different code changes into the UI and compare them side by side in the browser.
How do you not get lost mentally in what is exactly happening at each point in time? Just trusting the system and reviewing the final output? I feel like my cognitive constraints become the limits of this parallelized system. With a single workstream I pollute context, but feel way more secure somehow.
i suppose, gradually and the suddenly?
each "fix" to incorrect reasoning/solution doesn't just solve the current instance, it also ends up in a rule-based system that will be used in future
initially, being in the loop is necessary, once you find yourself "just approving" you can be relaxed and think back
or, more likely, initially you need fine-grained tasks; as reliability grows, tasks can become more complex
"parallelizing" allows single (sub)agents with ad-hoc responsibilities to rely on separate "institutionalized" context/rules, .ie: architecture-agent and coder-agent can talk to each others and solve a decision-conflict based on wether one is making the decision based on concrete rules you have added, or hallucinating decisions
i have seen a friend build a rule based system and have been impressed at how well LLM work within that context
Was going to ask how much all this cost, but this sort of answers it:
> "Managing Cost and Usage Limits: Chaining agents, especially in a loop, will increase your token usage significantly. This means you’ll hit the usage caps on plans like Claude Pro/Max much faster. You need to be cognizant of this and decide if the trade-off—dramatically increased output and velocity at the cost of higher usage—is worth it."
The biggest issue with sub-agents and even the CC Task tool is that they are black boxes, and we can’t see what’s going on inside them and cannot intervene. I’ve instead often found it better to leverage Tmux and have CC send messages to another CLI-agent (could be CC or the other now-surging CC, i.e., Codex-CLI, or of course any other competent CLI-agent) running in another pane. To make this smoother I built this Tmux-cli command that CC can use:
If the first CLI-agent just needs a review or suggestions of approaches, I find it helps to have the first agent ask the other CLI-agent to dump its analysis into a markdown file which it can then look at.
> One can hardly control one coding agent for correctness
Why not? I'm assuming we're not talking about "vibe coding" as it's not a serious workflow, it was suggested as a joke basically, and we're talking about working together with LLMs. Why would correctness be any harder to achieve than programming without them?
There were other HN posts suggesting BMAD, ccpm, conductor, etc. I considered giving it a try. They were quite comprehensive, to the point where I was exhausted reading all the documentation they’ve generated before coding - product requirements, epics, user stories/journeys, tasks, analysis, architecture, project plans.
The idea was to encapsulate the context for a subagent to work on in a single GitHub issue/document. I’m yet to see how the development/QA subagents will fare in real-world scenarios by relying on the context in the GitHub issue.
Like many others here, I believe subagents will starve for context. Claude Code Agent is context-rich, while claude subagents are context-poor.
Slightly off topic but I would really like agentic workflow that is embedded in my IDE as well as my code host provider like GitHub for pull requests.
Ideally I would like to spin off multiple agents to solve multiple bugs or features. The agents have to use the ci in GitHub to get feedback on tests. And I would like to view it on IDE because I like the ability to understand code by jumping through definitions.
Support for multiple branches at once - I should be able to spin off multiple agents that work on multiple branches simultaneously.
This already exists. Look at cursor with Linear, you can just reply with @cursor & some instructions and it starts working in a vm. You can watch it work on cursor.com/agents or using the cursor editor. Result is a PR. Also github has copilot getting integrated in the github ui, but not that great in my experience
Would that be solved by having several clones of your repo, each with a IDE and a Claude working on each problem? Much like how multiple people work in parallel.
Why not just use only async agents? You can fire off many tasks and check PRs locally when they complete the work. (I also work on devfleet.ai to improve this experience, any feedback is appreciated)
as much as ai has been a boon to my own development i writhe at the thought of middle managers oversold on the promise of ai and its output, making unrealistic requests and demanding 'MORE PRODUCTIVITY' at the greater cost of making more work in the future. Diluting code-as-craft, and commodifying it down to shovels of coal into the furnace.
These prompts remind me of the YouTubers giving people self-actualization advice. “Act like the person you want to be!” Telling the LLM that it is an experienced product manager doesn’t make it an experienced product manager, it just makes it sound like one. This is like launching an entire team of “fake it til you make it” employees.
I was bored yesterday and I tried to vibe code a simple react app yesterday using claude code and it was basically useless. It created a good shell of a code initially, but after 10 minutes I basically had to take over (It would be a feature, then regress the previous.)
Am I the only one convinced that all of the hype around coding agents like codex and claude is 85% BS ?
skimojoe|5 months ago
Underneath is just a system prompt, or more likely a prompt layered on top "You are a frontend engineer, competent in react and Next.js, tailwind-css" - the stack details and project layout, key information is already in the CLAUDE.md. For more stuff the model is going to call file-read tools etc.
I think its more theatre then utilty.
What I have taken to doing is having a parent folder and then frontend/ backend/ infra/ etc as children.
parent/CLAUDE.md frontend/CLAUDE.md backend/CLAUDE.md
The parent/CLAUDE.md provides a highlevel view of the stack "FastAPI backend with postgres, Next.js frontend using with tailwind, etc". The parent/CLAUDE.md also points to the childrens CLAUDE.md's which have more granular information.
I then just spawn a claude in the parent folder, set up plan mode, go back and forth on a design and then have it dump out to markdown to RFC/ and after that go to work. I find it does really well then as all changes it makes are made with a context of the other service.
johntash|5 months ago
My CLAUDE.md or AGENTS.md is usually just a bulleted list of reminders with high level information. If the agent needs more steering, I add more reminders. I try not to give it _too_ broad of a task without prior planning or it'll just go off the rails.
Something I haven't really experimented with is having claude generate ADRs [1] like your RFC/ idea. I'll probably try that and see how it goes.
[1]: https://adr.github.io/
faangguyindia|5 months ago
Subagents do not work well for coding at all
peepee1982|5 months ago
So far, I find it much more important to define task scope and boundaries. If I want to implement a non-trivial feature, I'll have one role for analyzing the problem and coming up with a high-level plan, and then another role for breaking that plan down into very small atomic steps. I'll then pass each step to an implementation role and give it both the high-level plan and the whole list of individual steps as context, while making it clear that the scope is only to implement that one specific step.
I've had very good results with this so far, and once the two main documents are done, I can automate this with a small orchestration script (that does not depend on an LLM and is completely deterministic) going through the list and passing each item to an implementation agent sequentially, even letting the agent create a commit message after every step so I can trace its work afterwards. I've had very clean long-running tasks this way with minimal need for fixing things afterwards. I can go to bed in the evening and launch it and wake up to a long list of commits.
With the new 6 dollar subscription by Z.ai which includes 120 prompts (around 2000 requests) every 5 hours, I can pretty much let this run without having to worry about exceeding my limits.
chickensong|5 months ago
CLAUDE.md is kept somewhat lean, with pointers to individual files in ./docs/ and .claude/commands is a symlink to .agents/commands.
After starting Claude, I use /commands to load a role and context, which pulls in only the necessary docs and avoids, say, loading UI design or test architecture docs, when adding a backend feature.
I don't want to have to do any of this, but it helps me try and keep the agents on the rails and minimize context rot.
rf15|5 months ago
pozol|5 months ago
kookamamie|5 months ago
CuriouslyC|5 months ago
I advise people to only use subagents for stuff that is very compartmentalized because they're hard to monitor and prone to failure with complex codebases where agents live and die by project knowledge curated in files like CLAUDE.md. If your main Claude instance doesn't give a good handoff to a subagent, or a subagent doesn't give a good handback to the main Claude, shit will go sideways fast.
Also, don't lean on agents for refactoring. Their ability to refactor a codebase goes in the toilet pretty quickly.
zarzavat|5 months ago
Very much this. I tried to get Claude to move some code from one file to another. Some of the code went missing. Some of it was modified along the way.
Humans have strategies for refactoring, e.g. "I'm going to start from the top of the file and Cut code that needs to be moved and Paste it in the new location". LLM don't have a clipboard (yet!) so they can't do this.
Claude can only reliably do this refactoring if it can keep the start and end files in context. This was a large file, so it got lost. Even then it needs direct supervision.
theshrike79|5 months ago
Like "evaluate the test coverage" or "check if the project follows the style guide".
This way the "main" context only gets the report and doesn't waste space on massive test outputs or reading multiple files.
quijoteuniv|5 months ago
stingraycharles|5 months ago
I’ve been using subagents since they were introduced and it has been a great way to manage context size / pollution.
prash2488|5 months ago
Then I moved some parts in rules, some parts in slash commands and then I got much better results.
The subagents are like a freelance contractors (I know, I have been one very recently) Good when they need little handoff (Not possible in realtime), little overseeing and their results are a good advice not an action. They don't know what you are doing, they don't care what you do with the info they produce. They just do the work for you while you do something else, or wait for them to produce independent results. They come and go with little knowledge of existing functionalities, but good on their own.
Here are 3 agents I still keep and one I am working on.
1: Scaffolding: Now I create (and sometimes destroy) a lot of new projects. I use a scaffolding agents when I am trying something new. They start with fresh one line instruction to what to scaffold (e.g. a New docker container with Hono and Postgres connection, or a new cloudflare worker which will connect to R2, D1 and AI Gateway, or a AWS Serverless API Gateway with SQS that does this that and that), where to deploy. At the end of the day they setup the project with structure, create a Github Repo and commit it for me. I will take it forward from them
2: Triage: When I face some issues which is not obvious from reading code alone, I give them the place, some logs and the agent will use whatever available (including the DB Data) to make a best guess of why this issue happens. I often found out they work best when they are not biased by recent work
3: Pre-Release Check QA: Now this QA will test the entire system (Essentially calling all integration and end-to-end test suite to make sure this product doesn't break anything existing. Now I am adding a functionality to let them see original business requirement and see if the code satisfies it or not. I want this agent to be my advisor to help me decide if something goes to release pipeline or not.
4: Web search (Experimental) Sometimes, some search are too costly for existing token, and we only need the end result, not what they search and those 10 pages it found out...
sixhobbits|5 months ago
I spent a few hours trying stuff like this and the results were pretty bad compared to just using CC with no agent specific instructions.
Maybe I needed to push through and find a combination that works but I don't find this article convincing as the author basically says "it works" without showing examples or comparing doing the same project with and without subagents.
Anyone got anything more convincing to suggest it's worth me putting more time into building out flows like this instead of just using a generic agent for everything?
lucraft|5 months ago
A backend developer subagent is going to do the job ok, but then the supervisor agent will be missing useful context about what’s been done and will go off the rails.
The ideal sub agent is one that can take a simple question, use up massive amounts of tokens answering it, and then return a simple answer, dropping all those intermediate tokens as unnecessary.
Documentation Search is a good one - does X library have a Y function - the subagent can search the web, read doc MCPs, and then return a simple answer without the supervisor needing to be polluted with all the context
redrove|5 months ago
At some point you gotta stop and wonder if you’re doing way too much work managing claude rather than your business problem.
noodletheworld|5 months ago
I see lots of people saying you should be doing it, but not actually doing it themselves.
Or at least, not showing full examples of exactly how to handle it when it starts to fail or scale, because obviously when you dont have anything, having a bunch of agents doing any random shit works fine.
Frustrating.
cpursley|5 months ago
zachwills|5 months ago
mindwok|5 months ago
dutchCourage|5 months ago
Last week I asked Claude Code to set up a Next.js project with internationalization. It tried to install a third party library instead of using the internationalization method recommended for the latest version of Next.js (using Next's middleware) and could not produce of functional version of the boilerplate site.
There are some specific cases where agentic AI does help me but I can't picture an agent running unchecked effectively in its current state.
jondwillis|5 months ago
Not very agentic but it works a lot better.
kobalsky|5 months ago
I've also seen seen it choking when tasked to add a simple result count on a search.
The short answer is, it's cheap to let it try.
taspeotis|5 months ago
zachwills|5 months ago
h33t-l4x0r|5 months ago
tzury|5 months ago
With all due respect to the .agents/ markdown files, Claude code often, like other LLMs, get fixed on a certain narrative, and no matter what the instructions are, it repeats that wrong choice over and over and over again, while “apologizing”…
Anything beyond a close and intimate review of its implementation is doomed to fail.
What made things a bit better recently was setting Gemini cli and Claude code taking turns in designing reviewing, implementing and testing each other.
zachwills|5 months ago
rufasterisco|5 months ago
My gut feeling from past experiences is that we have git, but now git-flow, yet: a standardized approach that is simple to learn and implement across teams.
Once (if?) someone will just "get it right", and has a reliable way to break this down do the point that engineer(s) can efficiently review specs and code against expectations, it'll be the moment where being a coder will have a different meaning, at large.
So far, all projects i've seen end up building "frameworks" to match each person internal workflow. That's great and can be very effective for the single person (it is for me), but unless that can be shared across teams, throughput will still be limited (when compared that of a team of engs, with the same tools).
Also, refactoring a project to fully leverage AI workflows might be inefficient, if compared to a rebuild from scratch to implement that from zero, since building docs for context in pair with development cannot be backported: it's likely already lost in time, and accrued as technical debt.
zachwills|5 months ago
Frannky|5 months ago
If code is a liability and the best part is no part, what about leveraging Markdown files only?
The last programs I created were just CLI agents with Markdown files and MCP servers(some code here but very little).
The feedback loop is much faster, allowing me to understand what I want after experiencing it, and self-correction is super fast. Plus, you don't get lost in the implementation noise.
ehnto|5 months ago
It's no different to inheriting a legacy application though. As well, from the perspective of a product owner, it's not a new risk.
Joel_Mckay|5 months ago
https://www.youtube.com/watch?v=wL22URoMZjo
Have a great day =3
jongjong|5 months ago
Fast decision-making is terrible for software development. You can't make good decisions unless you have a complete understanding of all reasonable alternatives. There's no way that someone who is juggling 4 LLMs at the same time has the capacity to consider all reasonable alternatives when they make technical decisions.
IMO, considering all reasonable alternatives (and especially identifying the optimal approach) is a creative process, not a calculation. Creative processes cannot be rushed. People who rush into technical decisions tend to go for naive solutions; they don't give themselves the space to have real lightbulb moments.
Deep focus is good but great ideas arise out of synthesis. When I feel like I finally understand a problem deeply, I like to sleep on it.
One of my greatest pleasures is going to bed with a problem running through my head and then waking up with a simple, creative solution which saves you a ton of work.
I hate work. Work sucks. I try to minimize the amount of time I spend working; the best way to achieve that is by staring into space.
I've solved complex problems in a few days with a couple of thousand lines of code which took some other developers, more intelligent than myself, months and 20K+ lines of code to solve.
a_bonobo|5 months ago
I was working on a large-ish R analysis. In R, people generally start with loading entire libraries like
library(a)
library(b)
etc., leading to namespace clashes. It's better practice to replace all calls to package-functions with package namespaces, i.e., it's better to do
a::function_a()
b::function_b()
than to load both libraries and then blindly trusting that function_a() and function_b() come from a and b.
I asked Claude Code to take a >1000 LOC R script and replace all function calls with their model-namespace function call. It ran one subagent to look for function calls, identified >40 packages, and then started one subagent per package call for >40 subagents. Cost-wise (and speed-wise!) it was mayhem as every subagent re-read the script. It was far faster and cheaper, but a bit harder to judge, to just copy paste the R script into regular Claude and ask it to carry out the same action. The lesson is that subagents are often costly overkill.
beefcake|5 months ago
I see people who never coded in their life signing up for loveable or some other code agent and try their luck.
What cements this thought pattern in your post is this: "If the agents get it wrong, I don’t really care—I’ll just fire off another run"
serendipityAI|5 months ago
alxh|5 months ago
rufasterisco|5 months ago
initially, being in the loop is necessary, once you find yourself "just approving" you can be relaxed and think back or, more likely, initially you need fine-grained tasks; as reliability grows, tasks can become more complex
"parallelizing" allows single (sub)agents with ad-hoc responsibilities to rely on separate "institutionalized" context/rules, .ie: architecture-agent and coder-agent can talk to each others and solve a decision-conflict based on wether one is making the decision based on concrete rules you have added, or hallucinating decisions
i have seen a friend build a rule based system and have been impressed at how well LLM work within that context
zachwills|5 months ago
ares623|5 months ago
siva7|5 months ago
awb|5 months ago
Most subagent examples are vague or simplistic.
raminf|5 months ago
> "Managing Cost and Usage Limits: Chaining agents, especially in a loop, will increase your token usage significantly. This means you’ll hit the usage caps on plans like Claude Pro/Max much faster. You need to be cognizant of this and decide if the trade-off—dramatically increased output and velocity at the cost of higher usage—is worth it."
d4rkp4ttern|5 months ago
https://github.com/pchalasani/claude-code-tools/tree/main?ta...
If the first CLI-agent just needs a review or suggestions of approaches, I find it helps to have the first agent ask the other CLI-agent to dump its analysis into a markdown file which it can then look at.
agigao|5 months ago
siva7|5 months ago
diggan|5 months ago
Why not? I'm assuming we're not talking about "vibe coding" as it's not a serious workflow, it was suggested as a joke basically, and we're talking about working together with LLMs. Why would correctness be any harder to achieve than programming without them?
chandureddyvari|5 months ago
The idea was to encapsulate the context for a subagent to work on in a single GitHub issue/document. I’m yet to see how the development/QA subagents will fare in real-world scenarios by relying on the context in the GitHub issue.
Like many others here, I believe subagents will starve for context. Claude Code Agent is context-rich, while claude subagents are context-poor.
simianwords|5 months ago
Ideally I would like to spin off multiple agents to solve multiple bugs or features. The agents have to use the ci in GitHub to get feedback on tests. And I would like to view it on IDE because I like the ability to understand code by jumping through definitions.
Support for multiple branches at once - I should be able to spin off multiple agents that work on multiple branches simultaneously.
posix86|5 months ago
Jare|5 months ago
muratsu|5 months ago
user1999919|5 months ago
wrs|5 months ago
zachwills|5 months ago
bazhand|5 months ago
Rover222|5 months ago
zachwills|5 months ago
unknown|5 months ago
[deleted]
jackblemming|5 months ago
tonkinai|5 months ago
misiti3780|5 months ago
Am I the only one convinced that all of the hype around coding agents like codex and claude is 85% BS ?
user3939382|5 months ago
x1unix|5 months ago
unknown|5 months ago
[deleted]