top | item 46140878

(no title)

rfw300 | 2 months ago

I'd like others' input on this: increasingly, I see Cursor, Jetbrains, etc. moving towards a model of having you manage many agents working on different tasks simultaneously. But in real, production codebases, I've found that even a single agent is faster at generating code than I am at evaluating its fitness and providing design guidance. Adding more agents working on different things would not speed anything up. But perhaps I am just much slower or a poorer multi-tasker than most. Do others find these features more useful?

discuss

SatvikBeri|2 months ago

I usually run one agent at a time in an interactive, pair-programming way. Occasionally (like once a week) I have some task where it makes sense to have one agent run for a long time. Then I'll create a separate jj workspace (equivalent of git worktree) and let it run.

I would probably never run a second agent unless I expected the task to take at least two hours, any more than that and the cost of multitasking for my brain is greater than any benefit, even when there are things that I could theoretically run in parallel, like several hypotheses for fixing a bug.

IIRC Thorsten Ball (Writing an Interpreter in Go, lead engineer on Amp) also said something similar in a podcast – he's a single-tasker, despite some of his coworkers preferring fleets of agents.

cube2222|2 months ago

Same.

I've recently described how I vibe-coded a tool to run this single background agent in a docker container in a jj workspace[0] while I work with my foreground agent but... my reviewing throughput is usually saturated by a single agent already, and I barely ever run the second one.

New tools keep coming up for running fleets of agents, and I see no reason to switch from my single-threaded Claude Code.

What I would like to see instead, are efforts on making the reviewing step faster. The Amp folks had an interesting preview article on this recently[1]. This is the direction I want tools to be exploring if they want to win me over - help me solve the review bottleneck.

[0]: https://news.ycombinator.com/item?id=45970668

[1]: https://ampcode.com/news/review

kace91|2 months ago

I really would like an answer to this.

My CTO is currently working on the ability to run several dockerised versions of the codebase in parallel for this kind of flow.

I’m here wondering how anyone could work on several tasks at once at a speed where they can read, review and iterate the output of one LLM in the time it takes for another LLM to spit an answer for a different task.

Like, are we just asking things as fast as possible and hoping for a good solution unchecked? Are others able to context switch on every prompt without a reduction in quality? Why are people tackling the problem of prompting at scale as if the bottleneck was token output rather than human reading and reasoning?

If this was a random vibecoding influencer I’d get it, but I see professionals trying this workflow and it makes me wonder what I’m missing.

c-linkage|2 months ago

I was going to say that this is how genetic algorithms work, but there is still too much human in the loop.

Maybe code husbandry?

stogot|2 months ago

My assumption lately is that this workflow is literally just “it works, so merge”. Running multiple in parallel does not allownfor inspection of the code just for testing functional requirements at the end

Aeolun|2 months ago

Hmm, I haven’t managed to make it work yet, and I’ve tried. The best I can manage is three completely separate projects, and they all get only divided attention (which is often good enough these days).

tcdent|2 months ago

Rather than having multiple agents running inside of one IDE window, I structure my codebase in a way that is somewhat siloed to facilitate development by multiple agents. This is an obvious and common pattern when you have a front-end and a back-end. Super easy to just open up those directories of the repository in separate environments and have them work in their own siloed space.

Then I take it a step further and create core libraries that are structured like standalone packages and are architected like third-party libraries with their own documentation and public API, which gives clear boundaries of responsibility.

Then the only somewhat manual step you have is to copy/paste the agent's notes of the changes that they made so that dependent systems can integrate them.

I find this to be way more sustainable than spawning multiple agents on a single codebase and then having to rectify merge conflicts between them as each task is completed; it's not unlike traditional software development where a branch that needs review contains some general functionality that would be beneficial to another branch and then you're left either cherry-picking a commit, sharing it between PRs, or lumping your PRs together.

Depending on the project I might have 6-10 IDE sessions. Each agent has its own history then and anything to do with running test harnesses or CLI interactions gets managed on that instance as well.

nurettin|2 months ago

Even with the best agent in plan mode, there can be communication problems, style mismatches, untested code, incorrect assumptions and code that is not DRY.

I prefer to use a single agent without pauses and catch errors in real time.

Multiple agent people must be using pauses, switching between agents and checking every result.

hmokiguess|2 months ago

I think this is the UX challenge of this era. How to design a piece of software that aids in promoting the human-level of attention to a distributed state without causing information loss or cognitive decline over many tasks. I agree that for any larger piece of work with significant scope the overhead of ingesting the context into your brain offsets the time saving costs you get from multitask promises.

My take on this is that the better these things get eventually we will be able to infer and quantify signals that provide high confidence scores for us to conduct a better review that requires a shorter decision path. This is akin to how compilers, parsers, linters, can give you some level of safety without strong guarantees but are often "good enough" to pass a smell test.

coffeefirst|2 months ago

No... I've found the opposite where using the fastest model to do the smallest pieces is useful and anything where I have to wait 2m for a wrong answer is just on the way.

There's pretty much no way anyone context switching that fast is paying a lick of attention. They may be having fun, like scrolling tiktok or playing a videogame just piling on stimuli, but I don't believe they're getting anything done. It's plausible they're smarter than me, it is not plausible they have a totally different kind of brain chemistry.

faizshah|2 months ago

The parallel agent model is better for when you know the high level task you want to accomplish but the coding might take a long time. You can split it up in your head “we need to add this api to the api spec” “we need to add this thing to the controller layer” etc. and then you use parallel agents to edit just the specific files you’re working on.

So instead of interactively making one agent do a large task you make small agents do the coding while you focus on the design.

tortilla|2 months ago

My context window is small. It's hard enough keeping track of one timeline, I just don't see the appeal in running multiple agents. I can't really keep up.

Protato85|2 months ago

For some things its helpful, like have one agent plan changes / get exact file paths, another agent implement changes, another agent review the PR, etc. The context window being small is the point I think. Chaining agents lets you break up the work, and also give different agents different toolsets so they aren't all taking a ton of MCPs / Claude Skills into context at once.

ivape|2 months ago

Right. A computer can make more code than a human can review. So, forget about the universe where you ever review code. You have to shift to almost a QA person and ignore all code and just validate the output. When it is suggested that you as a programmer will disappear, this is what they mean.

kace91|2 months ago

>You have to shift to almost a QA person and ignore all code and just validate the output.

The obvious answer to this is that it is not feasible to retry each past validation for each new change, which is why we have testing in the first place. Then you’re back at square one because your test writing ability limits your output.

Unless you plan on also vivecoding the tests and treating the whole job as a black box, in which case we might as well just head for the bunkers.

dwb|2 months ago

I think for production code this is wildly irresponsible. I’m having a decent time with LLM code generation, but I wouldn’t dream of skipping code review.

jasonsb|2 months ago

I'm with you. The industry has pivoted from building tools that help you code to selling the fantasy that you won't have to. They don't care about the reality of the review bottleneck; they care about shipping features that look like 'the future' to sell more seats.

NumerousProcess|2 months ago

I have to agree, currently it doesn't look that innovative. I would rather want parallel agents working on the same task, orchestrated in some way to get the best result possible. Perhaps using IntelliJ for code insights, validation, refactoring, debugging, etc.

dwb|2 months ago

Completely agree. The review burden and context switching I need to do from even having two running at once is too much, and using one is already pretty good (except when it’s not).

torginus|2 months ago

I think the problem is that current AI models are slow to generate tpkens so the obvious solution is 'parallelism'. If they could poop out pages of code instantly, nobody would think about parallel agents.

I wish we'll get a model that's not necessarily intelligent, but at least competent at following instructions and is very fast.

I overwhelmingly prefer the workflow where I have an idea for a change and the AI implements it (or pushes back, or does it in an unexpected way) - that way I still have a general idea of what's going on with the code.