I think that letting an LLM run unsupervised on a task is a good way to waste time and tokens. You need to catch them before they stray too far off-path. I stopped using subagents in Claude because I wasn't able to see what they were doing and intervene.
Indirectly asking an LLM to prompt another LLM to work on a long, multi-step task doesn't seem like a good idea to me.
I think community efforts should go toward making LLMs more deterministic with the help of good old-fashioned software tooling instead of role-playing and writing prayers to the LLM god.
When the task is bigger than I trust the agent to work on it on its own, or for me to review the results, I ask it to create a plan with steps. Then create a md file for each step. I review the steps, and ask the agent to implement the first one. Review that one, fix it, then ask it to update the next steps, and then implement the next one. And so on, until finished.
Codex is like an external consultant. You give it specs and it quietly putters away and only stops when the feature is done.
Claude is built more like a pair programmer, it displays changes live, "talks" about what it's doing and what's working et.
It's really, REALLY hard to abort codex mid-run to correct it. With Claude it's a lot easier when you see it doing something stupid or getting of the rails. Just hit ESC and tell it where it went wrong (like use task build, don't build it manually or use markdownlint, don't spend 5 minutes editing the markdown line by line).
I also use AI to do discrete, well-defined tasks so I can keep an eye on things before they go astray.
But I thought there are lots of agentic systems that loop back and ask for approval every few steps, or after every agent does its piece. Is that not the case?
FWIW, finished an eval of claude code against various tasks that amplifier works well on:
The agent demonstrated strong architectural and organizational capabilities but suffered from critical implementation gaps across all three analyzed tasks. The primary pattern observed is a "scaffold without substance" failure mode, where the agent produces well-structured, well-documented code frameworks that either don't work at all or produce placeholder outputs instead of real functionality. Of the three tasks analyzed, two failed due to placeholder/mock implementations (Cross-Repo Improvement Tool, Email Drafting Tool), and one failed due to insufficient verification of factual claims (GDPVAL Extraction). The common thread is a lack of validation and testing before delivery, combined with a tendency to prioritize architecture over functional implementation.
I've tried it. It works better than raw Claude. We're working on benchmarks now. But... it's a moving target as amplifier (an experimental project) is evolving rapidly.
I’ve seen people discuss these types of approaches on X. To me it looks like the concepts here are already tried and popular - they’re just packaging it up so that people who aren’t as deep in that world can get the same benefits. But I’m not an expert.
Claude Code is great, this is just a set of tweaks, not really "research". For anyone into vibe coding, there are dozens of interesting video tutorials on customizing Claude Code and running practical jobs, not limited to coding.
I think most of us are irritated by the constant A/B Testing and underwhelming releases. Lets just have the bubble pop so we can solve real problems instead of this.
I've actually written my own a homebrew framework like this which is a.) cli-coder agnostic and b.) leans heavily on git worktrees [0].
The secret weapon to this approach is asking for 2-4 solutions to your prompt running in parallel. This helps avoid the most time consuming aspect of ai-coding: reviewing a large commit, and ultimately finding the approach to the ai took is hopeless or requires major revision.
By generating multiple solutions, you can cutdown investing fully into the first solution and use clever ways to select from all the 2-4 candidate solutions and usually apply a small tweak at the end. Anyone else doing something like this?
There is a related idea called "alloying" where the 2-4 candidate solutions are pursued in parallel with different models, yielding better results vs any single model. Very interesting ideas.
> "I have more ideas than time to try them out" — The problem we're solving
I see a possible paradox here.
For exploration, my goal is _to learn_. Trying out multiple things is not wasting time, it's an intensive learning experience. It's not about finding what works fast, but understanding why the thing that works best works best. I want to go through it. Maybe that's just me though, and most people just want to get it done quickly.
Yeah, this seems like the opposite of invention. You can throw paint at a canvas but it won’t make you Pollock. And will you feel a sense of accomplishment?
Hey all! I'm one of a handful of developers on this project. Great to see it's getting some interest!
For context, we are right in the middle of building this thing... multiple rebuilds daily since we are using it to build itself. The value isn't in the code itself, yet, but in the approaches (UNIX philosophy, meta-cognitive recipes, etc.)
We are really excited about how productive these approaches are even in this early stage. We are able to have amplifier go off make significant progress unattended for sometimes hours at a time. This, of course, raises a lot of questions on how software will be built in the near future... questions which we are leaning into.
Most of our team's projects, unless they have some unresolved IP or are using internal-only systems, are built in the open. This is a research project at this stage. We recognize this approach it too expensive and too hacky for most independent developers (we're spending thousands of dollars daily on tokens). But once the patterns are identified, we expect we'll all find ways to make them more accessible.
The whole point of this is to experiment and learn fast.
I do a lot of work with claude code and codex cli but frankly as soon as I see all the LLM-tells in the readme, and then all the commit messages written by claude, I immediately don't want to read the readme or try the project until someone else recommends it to me.
This is gaining stars and forks but I don't know if that's just because it's under the github.com/microsoft, and I don't really know how much that means.
Starting in Claude bypass mode does not give me confidence:
WARNING: Claude Code running in Bypass Permissions mode │
│ │
│ In Bypass Permissions mode, Claude Code will not ask for your approval before running potentially dangerous commands. │
│ This mode should only be used in a sandboxed container/VM that has restricted internet access and can easily be restored if damaged.
This project is a research demonstrator. It is in early development and may change significantly. Using permissive AI tools in your repository requires careful attention to security considerations and careful human supervision, and even then things can still go wrong. Use it with caution, and at your own risk.
A lot of the ideas in this aren't bad, but in general it's hacky. Context export? Just use industry standard observability! This is so bad it makes me cringe. Parallel worktrees? These are prone to putting your repo in bad states when you run a lot of agents, and you have to deal with security, just put your agent in a container and have it clone the repo. Everything this project does it's doing the wrong way.
I have a repo that shows you how to do this stuff the correct way that's very easy to adapt, along with a detailed explanation, just do yourself a favor, skip the amateur hour re-implementations and instrument/silo your agents properly: https://sibylline.dev/articles/2025-10-04-hacking-claude-cod...
I'll always be skeptical about using AI to amplify AI. I think humans are needed to amplify AI since humans are so far documented to be significantly more creative and proactive in pushing the frontier than AI. I know, it's maybe a radical concept to digest.
> I'll always be skeptical about using AI to amplify AI.
This project was in part written by Claude, so for better or worse I think we're at least 3 levels deep here (AI-written code which directs an AI to direct other AIs to write code).
I think I'm more optimistic about this than brute-forcing model training with ever larger datasets, myself. Here's why.
Most models I've benchmarked, even the expensive proprietary models, tend to lose coherence when the context grows beyond a certain size. The thing is, they typically do not need the entire context to perform whatever step of the process is currently going on.
And there appears to be a lot of experimentation going on along the line of having subagents in charge of curating the long term view of the context to feed more focused work items to other subagents, and I find that genuinely intriguing.
My hope is that this approach will eventually become refined enough that we'll get dependable capability out of cheap open weight models. That might come in darn handy, depending on the blast radius of the bubble burst.
Based on clear, operational definitions, AI is definitely more creative than humans. E.g., can easily produce higher scores on a Torrance test of divergent thinking. Humans may still be more innovative (defined as creativity adopted into larger systems), though that may be changing.
> Never lose context again. Amplifier automatically exports your entire conversation before compaction, preserving all the details that would otherwise be lost. When Claude Code compacts your conversation to stay within token limits, you can instantly restore the full history.
If this is restoring the entire context (and looking at the source code, it seems like it is just reloading the entire context) how does this not result in an infinite compaction loop?
I think the idea would be that you could re-compact with a different focus. When you compact, you can give Claude instructions on what is important to retain and what can be discarded. If you later discover that actually you wanted something you discarded during a previous compaction, this could allow you to recover it.
Also, it can be useful to compact before it is strictly necessary to compact (before you are at max context length). So there could be a case where you decide you need to "undo" one of these types of early compactions for some reason.
Project looks interesting, but no demos. As much I want to try it because of all cool concepts mentioned, but I am not sure I want to invest my time if I don't see any demos
Hi all, I'm the primary author/lead on the "research exploration" that is Amplifier at Microsoft. It's still SUPER early and we're running fast and applying learnings from the past couple of years in new ways to explore some new value we're finding early evidence of. I apologize that the repo is in a very rough condition, we're running very fast and most of what is in there now has been very helpful but will very soon be completely replaced with our next major iteration of it as we continue to run ahead. I did want to take a pause today and put together a blog post to capture a little more context for those of you here who are following along:
For those who find it useful in this very early stage, to find some value for yourself in either using it or learning from it, happy to be on the journey together. For those who don't like it or don't understand why or what we're doing, I apologize again, it's definitely not for everyone at this stage, if ever, so no offense taken.
>"Amplifier is a complete development environment that takes AI coding assistants and supercharges them with discovered patterns, specialized expertise, and powerful automation — turning a helpful assistant into a force multiplier that can deliver complex solutions with minimal hand-holding."
Again this "supercharging" nonsense? Maybe in Satiyas confabulated AI-powered universe, but not in the real world I am afraid...
Yeah, I'm not even that opposed to using AI for documentation if it helps, but everything from Microsoft recently has been full-on slop. It's almost like they're trying to make sure you can't miss it's AI generated.
Sorry, is this Hacker News? This kind of project is exactly what I'd expect hackers to create. Not using AI in boring limited practical ways where it's known to somehow work, but supercharging AI with AI with AI... etc, and seeing what happens!
Well, stop asking silly questions. How will the execs get their bonuses, if it turns out we fucked up the web search and invested an equivalent of a moonbase in ... well, I hate to use the phrase, but statistical parrot ?
That's essentially what a CI environment does. "Multiple tabs" and "swarms". This part should feel familiar to any developer. Having multiple things running in the background to help you is not a new concept and we've been doing it for decades.
Whether these new helpers that explore ideas on their own are helpful or not, and for which cases, is another discussion.
I see you're being down voted, Reddit style. But you're on the mark about the hate tone of comments. If you don't like Amplifier, don't use it. No need to spew hate.
There are many many people who want better AI coding tools, myself included. It might or might not fail, but there is a clear and strong opportunity here, that it would be foolish of any large tech company to not pursue.
I would say it’s more the result of anti competitive bundling of cloud things into existing enterprise contracts rather than the wave. Microsoft is far worse than it ever was in the 90s but there’s no semblance of antitrust action in America.
nightshift1|4 months ago
danmaz74|4 months ago
hu3|4 months ago
theshrike79|4 months ago
Codex is like an external consultant. You give it specs and it quietly putters away and only stops when the feature is done.
Claude is built more like a pair programmer, it displays changes live, "talks" about what it's doing and what's working et.
It's really, REALLY hard to abort codex mid-run to correct it. With Claude it's a lot easier when you see it doing something stupid or getting of the rails. Just hit ESC and tell it where it went wrong (like use task build, don't build it manually or use markdownlint, don't spend 5 minutes editing the markdown line by line).
tummler|4 months ago
But I thought there are lots of agentic systems that loop back and ask for approval every few steps, or after every agent does its piece. Is that not the case?
ripped_britches|4 months ago
I’m super not interested in hearing what people have to say from a distance without actually using it.
payneio|4 months ago
The agent demonstrated strong architectural and organizational capabilities but suffered from critical implementation gaps across all three analyzed tasks. The primary pattern observed is a "scaffold without substance" failure mode, where the agent produces well-structured, well-documented code frameworks that either don't work at all or produce placeholder outputs instead of real functionality. Of the three tasks analyzed, two failed due to placeholder/mock implementations (Cross-Repo Improvement Tool, Email Drafting Tool), and one failed due to insufficient verification of factual claims (GDPVAL Extraction). The common thread is a lack of validation and testing before delivery, combined with a tendency to prioritize architecture over functional implementation.
payneio|4 months ago
rco8786|4 months ago
rs186|4 months ago
People are correct to question it.
If anything, Microsoft needs to show something meaningful to make people believe it's worth trying it out.
SilverElfin|4 months ago
awy311|4 months ago
unknown|4 months ago
[deleted]
hansmayer|4 months ago
ramraj07|4 months ago
[deleted]
stillsut|4 months ago
The secret weapon to this approach is asking for 2-4 solutions to your prompt running in parallel. This helps avoid the most time consuming aspect of ai-coding: reviewing a large commit, and ultimately finding the approach to the ai took is hopeless or requires major revision.
By generating multiple solutions, you can cutdown investing fully into the first solution and use clever ways to select from all the 2-4 candidate solutions and usually apply a small tweak at the end. Anyone else doing something like this?
[0]: https://github.com/sutt/agro
thethimble|4 months ago
https://xbow.com/blog/alloy-agents
alganet|4 months ago
I see a possible paradox here.
For exploration, my goal is _to learn_. Trying out multiple things is not wasting time, it's an intensive learning experience. It's not about finding what works fast, but understanding why the thing that works best works best. I want to go through it. Maybe that's just me though, and most people just want to get it done quickly.
tclancy|4 months ago
payneio|4 months ago
For context, we are right in the middle of building this thing... multiple rebuilds daily since we are using it to build itself. The value isn't in the code itself, yet, but in the approaches (UNIX philosophy, meta-cognitive recipes, etc.)
We are really excited about how productive these approaches are even in this early stage. We are able to have amplifier go off make significant progress unattended for sometimes hours at a time. This, of course, raises a lot of questions on how software will be built in the near future... questions which we are leaning into.
Most of our team's projects, unless they have some unresolved IP or are using internal-only systems, are built in the open. This is a research project at this stage. We recognize this approach it too expensive and too hacky for most independent developers (we're spending thousands of dollars daily on tokens). But once the patterns are identified, we expect we'll all find ways to make them more accessible.
The whole point of this is to experiment and learn fast.
payneio|4 months ago
furyofantares|4 months ago
This is gaining stars and forks but I don't know if that's just because it's under the github.com/microsoft, and I don't really know how much that means.
nightshift1|4 months ago
typpilol|4 months ago
npalli|4 months ago
claude Claude
Interesting given Microsoft’s history with OpenAI
wiether|4 months ago
https://techcrunch.com/2025/09/09/microsoft-to-lessen-relian...
mark212|4 months ago
This stood out to me too, seems like a months-long project with heavy use of Claude
vincnetas|4 months ago
WARNING: Claude Code running in Bypass Permissions mode │ │ │ │ In Bypass Permissions mode, Claude Code will not ask for your approval before running potentially dangerous commands. │ │ This mode should only be used in a sandboxed container/VM that has restricted internet access and can easily be restored if damaged.
nine_k|4 months ago
Caution
This project is a research demonstrator. It is in early development and may change significantly. Using permissive AI tools in your repository requires careful attention to security considerations and careful human supervision, and even then things can still go wrong. Use it with caution, and at your own risk.
nicwolff|4 months ago
cyral|4 months ago
CuriouslyC|4 months ago
I have a repo that shows you how to do this stuff the correct way that's very easy to adapt, along with a detailed explanation, just do yourself a favor, skip the amateur hour re-implementations and instrument/silo your agents properly: https://sibylline.dev/articles/2025-10-04-hacking-claude-cod...
jug|4 months ago
jsheard|4 months ago
This project was in part written by Claude, so for better or worse I think we're at least 3 levels deep here (AI-written code which directs an AI to direct other AIs to write code).
Balinares|4 months ago
Most models I've benchmarked, even the expensive proprietary models, tend to lose coherence when the context grows beyond a certain size. The thing is, they typically do not need the entire context to perform whatever step of the process is currently going on.
And there appears to be a lot of experimentation going on along the line of having subagents in charge of curating the long term view of the context to feed more focused work items to other subagents, and I find that genuinely intriguing.
My hope is that this approach will eventually become refined enough that we'll get dependable capability out of cheap open weight models. That might come in darn handy, depending on the blast radius of the bubble burst.
dr_dshiv|4 months ago
tcdent|4 months ago
If this is restoring the entire context (and looking at the source code, it seems like it is just reloading the entire context) how does this not result in an infinite compaction loop?
redhale|4 months ago
Also, it can be useful to compact before it is strictly necessary to compact (before you are at max context length). So there could be a case where you decide you need to "undo" one of these types of early compactions for some reason.
willahmad|4 months ago
fishmicrowaver|4 months ago
lordofgibbons|4 months ago
estimator7292|4 months ago
That's cute
nvader|4 months ago
paradox921|4 months ago
https://paradox921.medium.com/amplifier-notes-from-an-experi...
For those who find it useful in this very early stage, to find some value for yourself in either using it or learning from it, happy to be on the journey together. For those who don't like it or don't understand why or what we're doing, I apologize again, it's definitely not for everyone at this stage, if ever, so no offense taken.
chews|4 months ago
koakuma-chan|4 months ago
unknown|4 months ago
[deleted]
skrebbel|4 months ago
theusus|4 months ago
janpio|4 months ago
hansmayer|4 months ago
Again this "supercharging" nonsense? Maybe in Satiyas confabulated AI-powered universe, but not in the real world I am afraid...
nopelynopington|4 months ago
zb3|4 months ago
qsort|4 months ago
Zetobal|4 months ago
[deleted]
rs186|4 months ago
[deleted]
unknown|4 months ago
[deleted]
lpcvoid|4 months ago
[deleted]
dang|4 months ago
cynicalsecurity|4 months ago
[deleted]
firemelt|4 months ago
[deleted]
nine_k|4 months ago
PantaloonFlames|4 months ago
How is this different than Google's Jules thing? Both sort of experimental exploratory things.
hansmayer|4 months ago
alganet|4 months ago
Whether these new helpers that explore ideas on their own are helpful or not, and for which cases, is another discussion.
hn_throw_bs|4 months ago
[deleted]
ukFxqnLa2sBSBf6|4 months ago
[deleted]
dang|4 months ago
nba456_|4 months ago
[deleted]
rectang|4 months ago
"Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills."
https://news.ycombinator.com/newsguidelines.html
estimator7292|4 months ago
[deleted]
kkotak|4 months ago
ridruejo|4 months ago
[deleted]
shermantanktop|4 months ago
Gatekeepers who claim otherwise have something to sell.
rs186|4 months ago
Define "full potential".
Sounds like you are just making things up to sell your product.
quantumwoke|4 months ago
neuroelectron|4 months ago
xorgun|4 months ago
unknown|4 months ago
[deleted]
bgwalter|4 months ago
The Austrian army already switched to LibreOffice for security reasons, we don't need another spyware and code stealing tool.
falcor84|4 months ago
There are many many people who want better AI coding tools, myself included. It might or might not fail, but there is a clear and strong opportunity here, that it would be foolish of any large tech company to not pursue.
SilverElfin|4 months ago
I would say it’s more the result of anti competitive bundling of cloud things into existing enterprise contracts rather than the wave. Microsoft is far worse than it ever was in the 90s but there’s no semblance of antitrust action in America.