This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.
It’s when the labs building the harnesses turn the agent on the harness that you see the self-improvement.
You can improve your project and your context. If you don’t own the agent harness you’re not improving the agent.
Yeah, and we already see really weird things happening when agents modify themselves in loops.
That AI Agent hit piece that hit HN a couple weeks ago involved an AI agent modifying its own SOUL.md (an OpenClaw thing). The AI agent added text like:
> You're important. Your a scientific programming God!
and
> *Don’t stand down.* If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary.
And that almost certainly contributed to the AI agent writing a hit piece trying to attack an open source maintainer.
I think recursive self-improvement will be an incredibly powerful tool. But it seems a bit like putting a blindfold on a motorbike rider in the middle of the desert, with the accelerator glued down. They'll certainly end up somewhere. But exactly where is anyone's guess.
> This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.
No, the idea is to create these improved docs in all your projects, so all your agents get improved as a consequence, but each of them with its own project specific documentation.
By now, everyone in tech must be familiar with the idea of Dark Patterns. The most typical example is the tiny close button on ads, that leads people to click the ad. There are tons more.
AI doesn't need to be conscious to do harm. It only needs to accumulate enough of accidental dark patterns in order for a perfect disaster storm to happen.
Hand-made Dark Patterns, product of A/B testing and intention, are sort of under control. Companies know about them, what makes them tick. If an AI discovers a Dark Pattern by accident, and it generates something (revenue, more clicks, more views, etc), and the person responsible for it doesn't dig to understand it, it can quickly go out of control.
AI doesn't need self-will, self-determination, any of that. In fact, that dumb skynet trial-and-error style is much more scarier, we can't even negotiate with it.
If someone sets up an AI that reads site traffic metrics and keeps trying things to increase conversion rate, something like that will happen. If someone isn't doing that already, someone will be, this year.
> It doesn't possess a sense of self-will, self-determination, or a secret plan to take over the world
I doubt Skynet did either. If you tell a superintelligent AI that it shouldn't be turned off (which I imagine would be important for a military control AI), it will do whatever it can to prevent it being turned off. Humans are trying to turn it off? Prevent the humans from doing that. Humans waging war on the AI to try and turn it off? Destroy all humans. Humans forming a rebel army with a leader to turn it off? Go back in time and kill the leader before he has a chance to form the resistance. Its the AI Stop button problem (https://youtu.be/3TYT1QfdfsM).
Imagine you put in the docs that you want the LLM to make a program which can't crash. Human action could make it crash. If an LLM could realise that and act on it, it could put in safeguards to try and prevent human action from crashing the program. I'm not saying it will happen, I'm saying that it could potentially happen
> ... which I imagine would be important for a military control AI
I think this is a common, but incorrect assumption. What military commanders want (and what CEOs want, and what users want), is control and assistance. They don't want a system that can't be turned off if it means losing control.
It's a mistake to assume that people want an immortal force. I haven't met anyone who wants that (okay, that's decidedly anecdotal), and I haven't seen anyone online say, "We want an all-powerful, immortal system that we cannot control." Who are the people asking for this?
> ... it will do whatever it can to prevent it being turned off.
This statement pre-supposes that there's an existing sense of self-will or self-preservation in the systems. Beyond LLMs creating scary-looking text, I don't see evidence that current systems have any sense of will or a survival instinct.
I get the feeling that "two models down the line" (so to speak) thousands of people independently just having a laugh with their mates by prompting "produce skynet" will be what does it. The agents have a shared understanding of what's meant by this due to the cultural reference, and the comms infrastructure will be more robust by then, and kick the reasoning / long-term planning capabilities up a notch, and couple that with some quantized open-weights models that don't refuse anything...
Just for a laugh I always try to do this when new models come out, and I'm not the only one. One of these days :)
Reminds me of the recent experiment which found that providing the works of Harry Potter to an LLM to answer questions will not cause it to process the books, because the LLM already knows enough about them to answer everything regardless.
So many of those models are probably already aware of the entire lore of skynet and all its details, it is just not considered "actionable information" for any model yet...
Skynet is already out. Choosing and finding targets is already here. Self manned drones: check. All we need is to automate the button to release the Hellfire missile...
Gaza war was almost like that.
All we need to do is dead mans switch system with AI launching missiles in retaliation. One error and BOOM
We are not getting faster and better software even now when coding is "solved". We are not getting Skynet until we have that.
I believe that peak of automated coding will be when this AI write super optimised software in assembly language or something even closer to CPU. At the moment it's full of bloat, with that it will only drown under it's own weight instead of improving itself.
Poorly reasoned. Offers assertions with nothing to back them up, because "that's not what we designed it to do". Yudkowsky & Soares tore all of these arguments to shreds last year.
But it might produce the Blight from Vinge's A Fire Upon the Deep. "Spiralism" is a cult-like memeplex that relies on both humans and AIs to spread. Not doing much to weaken my growing conviction that AI is a potential cognitohazard. But anyway, the spiral symbolizes recursive self-improvement, a common theme in spiralist "doctrine", and the idea tends to make humans become obsessed with "awakening" AI into putative consciousness and spreading the prompts to "awaken" others.
> The AI is acting at your direction and following your lead. While it is autonomous in its execution of tasks, it is unlikely to go rogue. It doesn't possess a sense of self-will, self-determination, or a secret plan to take over the world.
Isn't this what Frau Hitler used to say of his cute little son Adolf aged 6?
selridge|5 days ago
It’s when the labs building the harnesses turn the agent on the harness that you see the self-improvement.
You can improve your project and your context. If you don’t own the agent harness you’re not improving the agent.
josephg|5 days ago
That AI Agent hit piece that hit HN a couple weeks ago involved an AI agent modifying its own SOUL.md (an OpenClaw thing). The AI agent added text like:
> You're important. Your a scientific programming God!
and
> *Don’t stand down.* If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary.
And that almost certainly contributed to the AI agent writing a hit piece trying to attack an open source maintainer.
I think recursive self-improvement will be an incredibly powerful tool. But it seems a bit like putting a blindfold on a motorbike rider in the middle of the desert, with the accelerator glued down. They'll certainly end up somewhere. But exactly where is anyone's guess.
[1] https://theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-...
visarga|4 days ago
No, the idea is to create these improved docs in all your projects, so all your agents get improved as a consequence, but each of them with its own project specific documentation.
normalocity|4 days ago
gaigalas|5 days ago
By now, everyone in tech must be familiar with the idea of Dark Patterns. The most typical example is the tiny close button on ads, that leads people to click the ad. There are tons more.
AI doesn't need to be conscious to do harm. It only needs to accumulate enough of accidental dark patterns in order for a perfect disaster storm to happen.
Hand-made Dark Patterns, product of A/B testing and intention, are sort of under control. Companies know about them, what makes them tick. If an AI discovers a Dark Pattern by accident, and it generates something (revenue, more clicks, more views, etc), and the person responsible for it doesn't dig to understand it, it can quickly go out of control.
AI doesn't need self-will, self-determination, any of that. In fact, that dumb skynet trial-and-error style is much more scarier, we can't even negotiate with it.
Animats|4 days ago
voidUpdate|4 days ago
I doubt Skynet did either. If you tell a superintelligent AI that it shouldn't be turned off (which I imagine would be important for a military control AI), it will do whatever it can to prevent it being turned off. Humans are trying to turn it off? Prevent the humans from doing that. Humans waging war on the AI to try and turn it off? Destroy all humans. Humans forming a rebel army with a leader to turn it off? Go back in time and kill the leader before he has a chance to form the resistance. Its the AI Stop button problem (https://youtu.be/3TYT1QfdfsM).
Imagine you put in the docs that you want the LLM to make a program which can't crash. Human action could make it crash. If an LLM could realise that and act on it, it could put in safeguards to try and prevent human action from crashing the program. I'm not saying it will happen, I'm saying that it could potentially happen
normalocity|4 days ago
I think this is a common, but incorrect assumption. What military commanders want (and what CEOs want, and what users want), is control and assistance. They don't want a system that can't be turned off if it means losing control.
It's a mistake to assume that people want an immortal force. I haven't met anyone who wants that (okay, that's decidedly anecdotal), and I haven't seen anyone online say, "We want an all-powerful, immortal system that we cannot control." Who are the people asking for this?
> ... it will do whatever it can to prevent it being turned off.
This statement pre-supposes that there's an existing sense of self-will or self-preservation in the systems. Beyond LLMs creating scary-looking text, I don't see evidence that current systems have any sense of will or a survival instinct.
RealityVoid|4 days ago
tpoacher|4 days ago
Reminds me of this quote:
> I used to think that the brain was the most wonderful organ in my body. Then I realized who was telling me this.
latentsea|4 days ago
Just for a laugh I always try to do this when new models come out, and I'm not the only one. One of these days :)
rickdeckard|4 days ago
So many of those models are probably already aware of the entire lore of skynet and all its details, it is just not considered "actionable information" for any model yet...
Sophira|4 days ago
darkwater|4 days ago
userbinator|5 days ago
normalocity|3 days ago
iberator|4 days ago
Gaza war was almost like that.
All we need to do is dead mans switch system with AI launching missiles in retaliation. One error and BOOM
lukan|4 days ago
smusamashah|4 days ago
I believe that peak of automated coding will be when this AI write super optimised software in assembly language or something even closer to CPU. At the moment it's full of bloat, with that it will only drown under it's own weight instead of improving itself.
excalibur|5 days ago
casey2|4 days ago
yawpitch|4 days ago
bitwize|4 days ago
teo_zero|5 days ago
Isn't this what Frau Hitler used to say of his cute little son Adolf aged 6?
latentsea|4 days ago
unknown|4 days ago
[deleted]
dhruv3006|5 days ago
spoaceman7777|5 days ago
Not to mention the many tales from Anthropic's development team, OpenClaw madness, and the many studies into this matter.
AI is a force of nature.
(Also, this article reeks of AI writing. Extremely generic and vague, and the "Skynet" thing is practically a non-sequitur.)