top | item 46470017

The creator of Claude Code's Claude setup

568 points| KothuRoti | 2 months ago |twitter.com

403 comments

[+] ahurmazda|2 months ago|reply

https://xcancel.com/bcherny/status/2007179832300581177

[+] tmerr|2 months ago|reply

This is interesting to hear, but I don't understand how this workflow actually works.

I don't need 10 parallel agents making 50-100 PRs a week, I need 1 agent that successfully solves the most important problem.

I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work. I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.

It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.

Likely I am missing something. This is just my gut reaction as someone who has definitely not mastered using agents. Would love to hear from anyone that has a similar workflow where there is high parallelism.

[+] crystal_revenge|2 months ago|reply

My initial response to reading this post was "wow, I think I'd rather just write the code".

I also remain a bit skeptical because, if all of this really worked (and I mean over a long time and scaling to meet a range of business requirements), even if it's not how I personally want to write code, shouldn't we be seeing a ton of 1 person startups?

I see Bay area startups pushing 996 and requiring living in the Bay area because of the importance of working in an office to reduce communication hurdles. But if I can really 10x my current productivity, I can get the power of a seed series startup with even less communication overhead (I could also get by with much less capital). Imagine being able to hire 10 reliable junior-mid engineers who unquestionably followed your instruction and didn't need to sleep. This is what I keep being told we have for $200/month. Forget not needing engineers, why do we need angel investors or even early stage VC? A single smart engineer should be able, if all the claims I'm hearing are true, to easily accomplish in months what used to take years.

But I keep seeing products shipped at the same speed but with a $200 per month per user overhead. Honestly I would love to be wrong on this because that would be incredibly cool. But unfortunately I'm not seeing it yet.

[+] stingraycharles|2 months ago|reply

I hope self-promotion isn't frowned upon, but I've been spending the past months figuring out a workflow [1] that helps tackle the "more complicated problems" and ensure long-term maintainability of projects when done purely through Claude Code.

Effectively, I try to:

- Do not allow the LLM to make any implicit decisions, but instead confirm with the user.

- Ensure code is written in such a way that it's easy to understand for LLMs;

- Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.

It's based entirely on Claude Code sub-agents + skills. The skills almost all invoke a Python script that guides the agents through workflows.

It's not a fast workflow: it frequently takes more than 1 hour just for the planning phase. Execution is significantly faster, as (typically) most issues have been discovered during the planning phase already (otherwise it would be considered a bug and I'd improve the workflow based on that).

I'm under the impression that the creator of Claude Code's post is also intended to raise awareness of certain features of Claude Code, such as hand-offs to the cloud and back. Their workflow only works for small features. It reads a bit like someone took a “best practices” guide and turned it into a twitter post. Nice, but not nearly detailed enough for an actual workflow.

[1] https://github.com/solatis/claude-config/

[+] danso|2 months ago|reply

Yes thank you! I find I get more than enough done (and more than enough code to review) by prompting the agent step by step. I want to see what kind of projects are getting done with multiple async autonomous agents. Was hoping to find youtube videos of someone setting up a project for multiple agents so I could see the cadence of the human stepping in and making directions

[+] exitb|2 months ago|reply

Multiple instances of agents are an equivalent to tabs in other applications - primarily holders of state, rather than means for extreme parallelism.

[+] thomasfromcdnjs|2 months ago|reply

I run 3-5 on distinct projects often. (20x plan) I quite enjoy the context switching and always have. I have a vanilla setup too, and I don't use plugins/skills/commands, sometimes I enable a MCP server for different things and definitely list out cli tools in my claude.md files. I keep a Google doc open where I list out all the projects I'm working on and write notes as I'm jumping thought the Claude tabs, I also start drafting more complex prompts in the Google doc. I've been using turbo repo a lot so I don't have to context switch the architecture in my head. (But projects still using multiple types of DevOps set ups)

Often these days I vibe code a feedback loop for each project, a way to validate itself as OP said. This adds time to how long Claude takes to complete giving me time to switch context for another active project.

I also use light mode which might help others... jks

[+] ponco|2 months ago|reply

I agree. I'm imagining a large software team with hundreds of tickets "ready to be worked on" might support this workflow - but even then, surely you're going to start running into unnecessary conflicts.

The max Claude instances I've run is 2 because beyond that, I'm - as you say - unable to actually determine the next best course during the processing time. I could spend the entire day planning / designing prompts - and perhaps that will be the most efficient software development practise in the future. And/or perhaps there it is a sign I'm doing insufficient design up front.

[+] HarHarVeryFunny|2 months ago|reply

I suppose he may have a list of feature requests and bug reports to work on, but it does seem a bit odd from a human perspective to want to work on 5 or more things literally in parallel, unless they are all so simple that there is no cognitive load and context switching required to mentally juggle them.

Washing dishes in parallel with laundry and cleaning is of course easily possible, but precisely because there is no cognitive load involved. When the washing machine stops you can interrupt what you are doing to load clothes into the drier, then go back to cleaning/whatever. Software development for anything non-trivial obviously has a much higher task-switching overhead. Optimal flow for a purely human developer is to "load context" at the beginning of the day, then remain in flow-state without interruptions.

The cynical part of me can't also help but wonder if Cherny/Anthopic aren't just advocating token-maxxing!

[+] MattGaiser|2 months ago|reply

> I need 1 agent that successfully solves the most important problem.

If you only have that one problem, that is a reasonable criticism, but you may have 10 different problems and want to focus on the important one while the smaller stuff is AIed away.

> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.

I am generally happy with the assumptions it makes when given few requirements? In a lot of cases I just need a feature and the specifics are fairly open or very obvious given the context.

For example, I am adding MFA options to one project. As I already have MFA for another portal on it, I just told Claude to add MFA options for all users. Single sentence with no details. Result seems perfectly servicable, if in need of some CSS changes.

[+] ex-aws-dude|2 months ago|reply

Yeah I don’t understand these posts recently with people running 10 at once

Can someone give an example of what each of them would be doing?

Are they just really slow, is that the problem?

[+] Haaargio|2 months ago|reply

I would do the same thing if I would justifing paying 200$ per Month for my hobby. But even with that, you will run into throttling / API / Resource limits.

But AI Agents need time. They need a little bit of reading the sourcecode, proposing the change, making the change, running the verification loop, creating the git commit etc. Can be a minute, can be 10 and potentially a lot longer too.

So if your code base is big enough that you can work o different topics, you just do that:

- Fix this small bug in the UI when xy happens - Add a new field to this form - Cleanup the README with content x - . . .

I'm an architect at work and have done product management on the side as its a very technical project. I have very little problem coming up with things to fix, enhnace, cleanup etc. I have hard limits on my headcount.

I could easily do a handful of things in parallel and keeping that in my head. Working memory might be limited but working memory means something different than following 10 topics. Especially if there are a few tpics inbetween which just take time with the whole feedback loop.

But regarding your example of house cleaning: I have ADHD, i sometimes work like this. Working on something, waiting for a build and cleaning soming in parallel.

What you are missing is the practical experience with Agents. Taking the time and energy of setting up something for you, perhaps accessability too?

We only got access at work to claude code since end of last year.

[+] victorbjorklund|2 months ago|reply

Depends on the project you are working on. Solo on a web app? You probably have 100s of small things to fix. Some more padding there, add a small new feature here, etc.

[+] raducu|2 months ago|reply

> don't need 10 parallel agents making 50-100 PRs a week

I don't like to be mean, but I few weeks ago the guy bragged about Claude helping him do +50k loc and -48k loc(netting a 2k loc), I thought he was joking because I know plenty of programmers who do exactly that without AI, they just commit 10 huge json test files or re-format code.

I almost never open a PR without a thorough cleanup whereas some people seem to love opening huge PRs.

[+] gherkinnn|2 months ago|reply

LLM agents can be a bit like slot machines. The more the merrier.

And at least two generate continuous shitposts for their companies Slack.

That said, having one write code and a clean context review it is helpful.

[+] giancarlostoro|2 months ago|reply

I use Beads which makes it more easy to grasp since its "tickets" for the agent, and I tell it what I want, it creates a bead (or "ticket") and then I ask it to do research, brain dump on it, and even ask it to ask me clarifying questions, and it updates the tasks, by the end once I have a few tasks with essentially a well defined prompt, I tell Claude to run x tasks in parallel, sometimes I dump a bunch of different tasks and ask it to research them all in parallel, and it fills them in, and I review. When it's all over, I test the code, look at the code, and mention any follow ups.

I guess it comes down to, how much do you trust the agent? If you don't trust it fully you want to inspect everything, which you still can, but you can choose to do it after it runs wild instead of every second it works.

[+] csomar|2 months ago|reply

It's all smokes really. Claude Code is an unreliable piece of software and yet one of the better ones in LLM-Coding. (https://github.com/anthropics/claude-code/issues). That and I highly suspect it's mostly engineers who are working on it instead of LLMs. Google itself with all its resources and engineers can't come up with a half-decent CLI for coding.

Reminder: The guy works for Claude. Claude is over-hyping LLMs. That's like a Jeweler dealer assistant telling you how Gold chains helped his romantic life.

[+] carefulfungi|2 months ago|reply

My impression is that people who are exploring coordinated multi-agent-coding systems are working towards replacing full teams, not augmenting individuals. "Meaningful supervising role" becomes "automated quality and process control"; "generate requirements quickly" -> we already do this for large human software teams.

If that's the goal, then we shouldn't interpret the current experiment as the destination.

[+] headcanon|2 months ago|reply

Potentially, a lot of that isn't just code generation, it *is* requirements gathering, design iteration, analysis, debugging, etc.

I've been using CC for non-programming tasks and its been pretty successful so far, at least for personal projects (bordering on the edge of non-trivial). For instance, I'll get a 'designer' agent coming up with spec, and a 'design-critic' to challenge the design and make the original agent defend their choices. They can ask open questions after each round and I'll provide human feedback. After a few rounds of this, we whittle it down to a decent spec and try it out after handing it off to a coding agent.

Another example from work: I fired off some code analysis to an agent with the goal of creating integration tests, and then ran a set of spec reviewers in parallel to check its work before creating the actual tickets.

My point is there are a lot of steps involved in the whole product development process and isn't just "ship production code". And we can reduce the ambiguity/hallucinations/sycophancy by creating validation/checkpoints (either tests, 'critic' agents to challenge designs/spec, or human QA/validation when appropriate)

The end game of this approach is you have dozens or hundreds of agents running via some kind of orchestrator churning through a backlog that is combination human + AI generated, and the system posts questions to the human user(s) to gather feedback. The human spends most of the time doing high-level design/validation and answering open questions.

You definitely incur some cognitive debt and risk it doing something you don't want, but thats part of the fun for me (assuming it doesn't kill my AI bill).

[+] quijoteuniv|2 months ago|reply

This is it! “I don't need 10 parallel agents making 50-100 PRs a week, I need 1 agent that successfully solves the most important problem.”

[+] baby|2 months ago|reply

I usually have 4-5, but it's because they are working on different parts of the codebase, or some I will use as read only to brainstorm

[+] david_shi|2 months ago|reply

> It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.

In this case you have to take a leap of faith and assume that Claude or Codex will get each task done correctly enough that your house won't burn down.

[+] CraigJPerry|2 months ago|reply

>> I need 1 agent that successfully solves the most important problem

In most of these kinds of posts, that's still you. I don't believe i've come across a pro-faster-keyboard post yet that claims AGI. Despite the name, LLMs have no agency, it's still all on you.

Once you've defined the next most important problem, you have a smaller problem - translate those requirements into code which accurately meets them. That's the bit where these models can successfully take over. I think of them as a faster keyboard and i've not seen a reason to change my mind yet despite using them heavily.

[+] unknown|2 months ago|reply

[deleted]

[+] bob1029|2 months ago|reply

If you're trying to solve one very hard problem, parallelism is not the answer. Recursion is.

Recursion can give you an exponential reduction in error as you descend into the call stack. It's not guaranteed in the context of an LLM but there are ways to strongly encourage some contraction in error at each step. As long as you are, on average, working with a slightly smaller version of the problem each time you recurse, you still get exponential scaling.

[+] CuriouslyC|2 months ago|reply

The problem isn't generating requirements, it's validating work. Spec driven development and voice chat with ticket/chat context is pretty fast, but the validation loop is still mostly manual. When I'm building, I can orchestrate multiple swarm no problem, however any time I have to drop in to validate stuff, my throughput drops and I can only drive 1-2 agents at a time.

[+] aforwardslash|2 months ago|reply

It depends on the specifics of the tasks; I routinely work on 3-5 projects at once (sometimes completely different stuff), and having a tool like cloud code fits great in my workflow.

Also, the feedback doesnt have to be immediate: sometimes I have sessions that run over a week, because of casual iterations; In my case its quite common to do this to test concepts, micro-benchmarking and library design.

[+] bitfilped|2 months ago|reply

The only way to achieve that level of parallelism is by not knowing what you are doing or the peoblem space you are working in to begin with and just throwing multiple ill defined queries at agents until something "works". It's sort of a modern infinite monkey theorem if you will.

[+] xnx|2 months ago|reply

Agree. People are stuck applying the "agent" = "employee" analogy and think they are more productive by having a team/company of agents. Unless you've perfectly spec'ed and detailed multiple projects up front, the speed of a single agent shouldn't be the bottleneck.

[+] dangus|2 months ago|reply

Let’s not forget the massive bias in the author: for all we know this post is a thinly veiled marketing pitch for “how to use the most tokens from your AI provider and ramp up your bill.”

This isn’t about being the most productive or having the best workflow, it’s about maximizing how much Claude is a part of your workflow.

[+] eaurouge|2 months ago|reply

> It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.

But we do this routinely with machines. Not saying I don't get your point re 100 PRs a week, just that it's a strange metaphor given the similarities.

[+] gedy|2 months ago|reply

> This is interesting to hear, but I don't understand how this workflow actually works

The cynic in me is it's a marketing pitch to sell "see this is way cheaper than 10 devs!". The "agent" thing leans heavily into bean counter CTO/CIO marketing.

[+] aaronbrethorst|2 months ago|reply

He did a follow up with someone on Reddit and the answers were posted here: https://www.reddit.com/r/ClaudeAI/comments/1q2c0ne/comment/n...

It offers a lot more context and I found it more helpful than the original twitter thread

[+] TheTaytay|2 months ago|reply

This was extremely useful to read for many reasons, but my favorite thing I learned is that you can “teleport” a task FROM the local Claude Code to Claude Code on the web by prepending your request with “&”. That makes it a “background” task, which I initially erroneously thought was a local background task. Turns out it sends the task and conversation history up to the web version. This allows you to do work in other branches on Claude Code web, (and then teleport those sessions back down to local later if you wish)

[+] ej88|2 months ago|reply

I implemented some of his setup and have been loving it so far.

My current workflow is typically 3-5 Claude Codes in parallel

- Shallow clone, plan mode back and forth until I get the spec down, hand off to subagent to write a plan.md

- Ralph Wiggum Claude using plan.md and skills until PR passes tests, CI/CD, auto-responds to greptile reviews, prepares the PR for me to review

- Back and forth with Claude for any incremental changes or fixes

- Playwright MCP for Claude to view the browser for frontend

I still always comb through the PRs and double check everything including local testing, which is definitely the bottleneck in my dev cycles, but I'll typically have 2-4 PRs lined up ready for me at any moment.

[+] rdiddly|2 months ago|reply

I feel like it's time for me to hang up this career. Prompting is boring, and doing it 5 times at once is just annoying multitasking. I know I'm mostly in it for the money, but at least there used to be a feeling of accomplishment sometimes. Now it's like, whose accomplishment is it?

[+] bikeshaving|2 months ago|reply

Must be nice to have unquota’ed tokens to use with frontier AI (is this the case for Anthropic employees?). One thing I think is fascinating as we enter the Intellicene is the disproportionate access to AI. The ability to petition them to do what you want is currently based on monthly subscriptions, but will it change in the future? Who knows?

[+] omnicognate|2 months ago|reply

I tried Claude Code a while back when I decided to give "vibe-coding" a go. That was was actually quite successful, producing a little utility that I use to this day, completely without looking at the code. (Well, I did briefly glance at it after completion and it made my eyeballs melt.) I concluded the value of this to me personally was nowhere near the price I was charged so I didn't continue using it, but I was impressed nonetheless.

This brief use of Claude Code was done mostly on a train using my mobile phone's wi-fi hotspot. Since the connection would be lost whenever the train went through a tunnel, I encountered a bug in Claude Code [1]. The result of it was that whenever the connection dropped and came up again I had to edit an internal json file it used to track the state of its tool use, which had become corrupt.

The issue had been open for months then, and still is. The discussion under it is truly remarkable, and includes this comment from the devs:

> While we are always monitoring instances of this error and and looking to fix them, it's unlikely we will ever completely eliminate it due to how tricky concurrency problems are in general.

Claude Code is, in principle, a simple command-line utility. I am confident that (given the backend and model, ofc) I could implement the functionality of it that I used in (generously!) at most a few thousand lines of python or javascript, I am very confident that I could do so without introducing concurrency bugs and I am extremely confident that I could do it without messing up the design so badly that concurrency issues crop up continually and I have to admit to being powerless to fix them all.

Programming is hard, concurrency problems are tricky and I don't like to cast aspersions on other developers, but we're being told this is the future of programming and we'd better get on board or be left behind and it looks like we're being told this by people who, with presumably unlimited access to all this wonderful tooling, don't appear to be able to write decent software.

[1] https://github.com/anthropics/claude-code/issues/6836

[+] varun_chopra|2 months ago|reply

It would be very interesting to see the outputs of his operations. How productive is one of his agents? How long does it take to complete a task, and how often does it require steering?

I'm a bit of a skeptic. Claude Code is good, but I've had varied results during my usage. Even just 5 minutes ago, I asked CC to view the most recent commit diff using git show. Even when I provided the command, it was doing dumb shit like git show --stat and then running wc for some reason...

I've been working on something called postkit[1], which has required me to build incrementally on a codebase that started from nothing and has now grown quite a lot. As it's grown, Claude Code's performance has definitely dipped.

[1] https://github.com/varunchopra/postkit

[+] jedberg|2 months ago|reply

The funniest part of that whole thing was when someone said "I trusted you, but you use light mode on your terminal" and then he replied that people stop by his desk daily just to make fun of him for it.

[+] copperx|2 months ago|reply

I'm afraid to ask, but because I've been very happy with Codex 5.2 CLI and I can't imagine Claude Code doing better, why is it Claude so loved around here?

Sure, I can spend $20 and figure it out, but I already pay $40/mo for two ChatGPT subs and that's enough to get me through a month.

Should I spend $20 to see for myself?

[+] zen4ttitude|2 months ago|reply

What I find surprising is how much human intervention the creator of Claude uses. Every time Claude does something bad we write it in claude.md so he learns from it... Why not create an agent to handle this and learn automatically from previous implementations. B: Outcome Weighting

  # memory/store.py
  OUTCOME_WEIGHTS = {
      RunOutcome.SUCCESS: 1.0,    # Full weight
      RunOutcome.PARTIAL: 0.7,    # Some issues but shipped
      RunOutcome.FAILED: 0.3,     # Downweighted but still findable
      RunOutcome.CANCELLED: 0.2,  # Minimal weight
  }

  # Applied during scoring:
  final_score = score * decay_factor * outcome_weight

  C: Anti-Pattern Retrieval

  # Similar features → SUCCESS/PARTIAL only
  similar_features = store.search(..., outcome_filter=[SUCCESS, PARTIAL])

  # Anti-patterns → FAILED only (separate section)
  anti_patterns = store.search(..., outcome_filter=[FAILED])

  Injected into agent prompt:
  ## Similar Past Features (Successful)
  1. "Add rate limiting with Redis..." (Outcome: success, Score: 0.87)

  ## Anti-Patterns (What NOT to Do)
  _These similar attempts failed - avoid these approaches:_
  1. "Add rate limiting with in-memory..." (FAILED, Score: 0.72)

  ## Watch Out For
  - **Redis connection timeout**: Set connection pool size

  The flow now:
  Query: "Add rate limiting"
           │
           ├──► Similar successful features (ranked by outcome × decay × similarity)
           │
           ├──► Failed attempts (shown as warnings)
           │
           └──► Agent sees both "what worked" AND "what didn't"

[+] andruby|2 months ago|reply

Great list of useful tips.

It's interesting that Boris doesn't mention "Agent Skills" at all. I'm still a bit confused at the difference between slash commands and Agent Skills.

https://code.claude.com/docs/en/skills

[+] mks_shuffle|2 months ago|reply

How much Codex and Claude Code are different from each other? I have been using Codex for few weeks doing experiments related to data analysis and training models with some architecture modifications. I wouldn't say I have used it extensively, but so far my experience has been good. Only annoying part has been not able to use GPU in the Codex without using `--sandbox danger-full-access` flag. Today, I started using Claude Code, and ran similar experiments as Codex. I find the interface is quite similar to Codex. However, I hit the limit quite quickly in Claude Code. I will be exploring its features further. I would appreciate if anyone can share their experience of using both tools.

[+] heliumtera|2 months ago|reply

Why stop at 5-10? Make it 5 billion - 10 billion parallel agents. PR number go up

[+] vemv|2 months ago|reply

How has Claude Code (as a CLI tool, not the backing models) evolved over the last year?

For me it's practically the same, except for features that I don't need, don't work that well and are context-hungry.

Meanwhile, Claude Code still doesn't know how to jump to a dependency (library's) source to obtain factual information about it. Which is actually quite easy by hand (normally it's cd'ing into a directory or unzipping some file).

So, this wasteful workflow only resulted in vibecoded, non-core features while at the domain level, Claude Code remains overly agnostic if not stupid.

[+] Snakes3727|2 months ago|reply

Frankly Claude code is painfully slow. To the point I get frustrated.

On large codebases I often find it taking 20+ minutes to do basic things like writing tests.

Way too often people are like it takes 2 minutes for it to do a full pr. Yeah how big is the code base actually.

I also have a coworker who is about 10x more then everyone else. Burning through credits yet he is one of the lowest performers.{closing in on around 1k worth of credits a day now).

[+] preommr|2 months ago|reply

I don't understand how these setups scale longterm, and even more so for the average user. The latter is relevant because, as he points out, his setup isn't that far out of reach of the average person - it's still fairly close to out of the box claude code, and opus.

But between the model qualities varying, the pricing, the timing, the tools constantly changing, I think it's really difficult to build the institutional knowledge and setup that can be used beyond a few weeks.

In the era of AI, I don't tink it's good enough to "have" a working product. It's also important to have all the other things that make a project way more productive, like stellar documentation, better abstractions, clearer architecture. In terms of AI, there's gotta be something better than just a markdown file with random notes. Like what happens when an agent does something because it's picking something up from some random slack convo, or some minor note in a 10k claude.md file. It just seems like the wild west where basic ideas like additional surface area being a liability is ignored because we're too early in the cycle.

tl;dr If it's just pushing around typical mid-level code, then... I just think that's falling behind.

[+] baalimago|2 months ago|reply

I'm a bit jealous. I would like to experiment with having a similar setup, but 10x Opus 4.5 running practically non stop must amount to a very high inference bill. Is it really worth the output?

From experimentation, I need to coach the models quite closely in order to get enough value. Letting it loose only works when I've given very specific instructions. But I'm using Codex and Clai, perhaps Claude code is better.

[+] thetamarind11|2 months ago|reply

Warm take for sure, but I feel that LLMs and agents have made me a worse programmer as a whole. I am enjoying the less mental strain as I do my hobbies like sketching/art while the LLM is running. But definitely it isn't making me any faster.

I'm at the point of considering another job but I just fear that my skills have deteriorated to the point that I can't pass any (manual) coding assessments anymore