(no title)
rfw300
|
2 months ago
I'd like others' input on this: increasingly, I see Cursor, Jetbrains, etc. moving towards a model of having you manage many agents working on different tasks simultaneously. But in real, production codebases, I've found that even a single agent is faster at generating code than I am at evaluating its fitness and providing design guidance. Adding more agents working on different things would not speed anything up. But perhaps I am just much slower or a poorer multi-tasker than most. Do others find these features more useful?
SatvikBeri|2 months ago
I would probably never run a second agent unless I expected the task to take at least two hours, any more than that and the cost of multitasking for my brain is greater than any benefit, even when there are things that I could theoretically run in parallel, like several hypotheses for fixing a bug.
IIRC Thorsten Ball (Writing an Interpreter in Go, lead engineer on Amp) also said something similar in a podcast – he's a single-tasker, despite some of his coworkers preferring fleets of agents.
cube2222|2 months ago
I've recently described how I vibe-coded a tool to run this single background agent in a docker container in a jj workspace[0] while I work with my foreground agent but... my reviewing throughput is usually saturated by a single agent already, and I barely ever run the second one.
New tools keep coming up for running fleets of agents, and I see no reason to switch from my single-threaded Claude Code.
What I would like to see instead, are efforts on making the reviewing step faster. The Amp folks had an interesting preview article on this recently[1]. This is the direction I want tools to be exploring if they want to win me over - help me solve the review bottleneck.
[0]: https://news.ycombinator.com/item?id=45970668
[1]: https://ampcode.com/news/review
kace91|2 months ago
My CTO is currently working on the ability to run several dockerised versions of the codebase in parallel for this kind of flow.
I’m here wondering how anyone could work on several tasks at once at a speed where they can read, review and iterate the output of one LLM in the time it takes for another LLM to spit an answer for a different task.
Like, are we just asking things as fast as possible and hoping for a good solution unchecked? Are others able to context switch on every prompt without a reduction in quality? Why are people tackling the problem of prompting at scale as if the bottleneck was token output rather than human reading and reasoning?
If this was a random vibecoding influencer I’d get it, but I see professionals trying this workflow and it makes me wonder what I’m missing.
c-linkage|2 months ago
Maybe code husbandry?
stogot|2 months ago
Aeolun|2 months ago
tcdent|2 months ago
Then I take it a step further and create core libraries that are structured like standalone packages and are architected like third-party libraries with their own documentation and public API, which gives clear boundaries of responsibility.
Then the only somewhat manual step you have is to copy/paste the agent's notes of the changes that they made so that dependent systems can integrate them.
I find this to be way more sustainable than spawning multiple agents on a single codebase and then having to rectify merge conflicts between them as each task is completed; it's not unlike traditional software development where a branch that needs review contains some general functionality that would be beneficial to another branch and then you're left either cherry-picking a commit, sharing it between PRs, or lumping your PRs together.
Depending on the project I might have 6-10 IDE sessions. Each agent has its own history then and anything to do with running test harnesses or CLI interactions gets managed on that instance as well.
nurettin|2 months ago
I prefer to use a single agent without pauses and catch errors in real time.
Multiple agent people must be using pauses, switching between agents and checking every result.
hmokiguess|2 months ago
My take on this is that the better these things get eventually we will be able to infer and quantify signals that provide high confidence scores for us to conduct a better review that requires a shorter decision path. This is akin to how compilers, parsers, linters, can give you some level of safety without strong guarantees but are often "good enough" to pass a smell test.
coffeefirst|2 months ago
There's pretty much no way anyone context switching that fast is paying a lick of attention. They may be having fun, like scrolling tiktok or playing a videogame just piling on stimuli, but I don't believe they're getting anything done. It's plausible they're smarter than me, it is not plausible they have a totally different kind of brain chemistry.
faizshah|2 months ago
So instead of interactively making one agent do a large task you make small agents do the coding while you focus on the design.
tortilla|2 months ago
Protato85|2 months ago
ivape|2 months ago
kace91|2 months ago
The obvious answer to this is that it is not feasible to retry each past validation for each new change, which is why we have testing in the first place. Then you’re back at square one because your test writing ability limits your output.
Unless you plan on also vivecoding the tests and treating the whole job as a black box, in which case we might as well just head for the bunkers.
dwb|2 months ago
jasonsb|2 months ago
NumerousProcess|2 months ago
dwb|2 months ago
torginus|2 months ago
I wish we'll get a model that's not necessarily intelligent, but at least competent at following instructions and is very fast.
I overwhelmingly prefer the workflow where I have an idea for a change and the AI implements it (or pushes back, or does it in an unexpected way) - that way I still have a general idea of what's going on with the code.