top | item 46748217

(no title)

mafriese | 1 month ago

Nope it isn’t. I did it as a joke initially (I also had a version where every 2 stories there was a meeting and if a someone underperformed it would get fired). I think there are multiple reasons why it actually works so well:

- I built a system where context (+ the current state + goal) is properly structured and coding agents only get the information they actually need and nothing more. You wouldn’t let your product manager develop your backend and I gave the backend dev only do the things it is supposed to and nothing more. If an agent crashes (or quota limits are reached), the agents can continue exactly where the other agents left off.

- Agents are ”fighting against” each other to some extend? The Architect tries to design while the CAB tries to reject.

- Granular control. I wouldn’t call “the manager” _a deterministic state machine that is calling probabilistic functions_ but that’s to some extent what it is? The manager has clearly defined tasks (like “if file is in 01_design —> Call Architect)

Here’s one example of an agent log after a feature has been implemented from one of the older codebases: https://pastebin.com/7ySJL5Rg

discuss

order

ggoo|1 month ago

Thanks for clarifying - I think some of the wording was throwing me off. What a wild time we are in!

stavros|1 month ago

What OpenCode primitive did you use to implement this? I'd quite like a "senior" Opus agent that lays out a plan, a "junior" Sonnet that does the work, and a senior Opus reviewer to check that it agrees with the plan.

mafriese|1 month ago

You can define the tools that agents are allowed to use in the opencode.json (also works for MCP tools I think). Here’s my config: https://pastebin.com/PkaYAfsn

The models can call each other if you reference them using @username.

This is the .md file for the manager : https://pastebin.com/vcf5sVfz

I hope that helped!

overfeed|1 month ago

> [...]coding agents only get the information they actually need and nothing more

Extrapolating from this concept led me to a hot-take I haven't had time to blog about: Agentic AI will revive the popularity of microservices. Mostly due to the deleterious effect of context size on agent performance.

throwup238|1 month ago

Why would they revive the popularity of microservices? They can just as well be used to enforce strict module boundaries within a modular monolith keeping the codebase coherent without splitting off microservices.

tripledry|1 month ago

In a fresh project that is well documented and set up it might work better. Many issues that Agents have in my work is that the endpoints are not always documented correctly.

Real example that happened to me, Agent forgets to rename an expected parameter in API spec for service 1. Now when working on service 2, there is no other way of finding this mistake for the Agent than to give it access to service 1. And now you are back to "... effect of context size on agent performance ...". For context, we might have ~100 services.

One could argue these issues reduce over time as instruction files are updated etc but that also assumes the models follow instructions and don't hallucinate.

That being said, I do use Agents quite successfully now - but I have to guide them a bit more than some care to admit.

imiric|1 month ago

Isn't all this a manual implementation of prompt routing, and, to a lesser extent, Mixture of Experts?

These tools and services are already expected to do the best job for specific prompts. The work you're doing pretty much proves that they don't, while also throwing much more money at them.

How much longer are users going to have to manually manage LLM context to get the most out of these tools? Why is this still a problem ~5 years into this tech?

nobody_r_knows|1 month ago

I'm confused when you say you have a manager, scrum master, archetech, all supposdely sharing the same memory, do each of those "employees" "know" what they are? And if so, based on what are their identities defined? Prompts? Or something more. Or am I just too dumb to understand / swimming against the current here. Either way, it sounds amazing!

Jimmc414|1 month ago

Their roles are defined by prompts. Only memory are shared files and the conversation history that’s looped back to stateless API calls to an LLM.