(no title)
mafriese | 1 month ago
- I built a system where context (+ the current state + goal) is properly structured and coding agents only get the information they actually need and nothing more. You wouldn’t let your product manager develop your backend and I gave the backend dev only do the things it is supposed to and nothing more. If an agent crashes (or quota limits are reached), the agents can continue exactly where the other agents left off.
- Agents are ”fighting against” each other to some extend? The Architect tries to design while the CAB tries to reject.
- Granular control. I wouldn’t call “the manager” _a deterministic state machine that is calling probabilistic functions_ but that’s to some extent what it is? The manager has clearly defined tasks (like “if file is in 01_design —> Call Architect)
Here’s one example of an agent log after a feature has been implemented from one of the older codebases: https://pastebin.com/7ySJL5Rg
ggoo|1 month ago
stavros|1 month ago
mafriese|1 month ago
The models can call each other if you reference them using @username.
This is the .md file for the manager : https://pastebin.com/vcf5sVfz
I hope that helped!
overfeed|1 month ago
Extrapolating from this concept led me to a hot-take I haven't had time to blog about: Agentic AI will revive the popularity of microservices. Mostly due to the deleterious effect of context size on agent performance.
throwup238|1 month ago
tripledry|1 month ago
Real example that happened to me, Agent forgets to rename an expected parameter in API spec for service 1. Now when working on service 2, there is no other way of finding this mistake for the Agent than to give it access to service 1. And now you are back to "... effect of context size on agent performance ...". For context, we might have ~100 services.
One could argue these issues reduce over time as instruction files are updated etc but that also assumes the models follow instructions and don't hallucinate.
That being said, I do use Agents quite successfully now - but I have to guide them a bit more than some care to admit.
imiric|1 month ago
These tools and services are already expected to do the best job for specific prompts. The work you're doing pretty much proves that they don't, while also throwing much more money at them.
How much longer are users going to have to manually manage LLM context to get the most out of these tools? Why is this still a problem ~5 years into this tech?
nobody_r_knows|1 month ago
Jimmc414|1 month ago
simultsop|1 month ago