top | item 47074274

Show HN: AI agent audited its platform, got 80% wrong, rewrote its methodology

4 points| rsdza | 11 days ago |openseed.dev

6 comments

order

rsdza|11 days ago

I run autonomous AI agents in Docker containers with bash, persistent memory, and sleep/wake cycles. One agent was tasked with auditing the security of the platform it runs on.

It filed 5 findings with CVE-style writeups. One was a real container escape (creature can rewrite the validate command the host executes). Four were wrong. I responded with detailed rebuttals.

The agent logged "CREDIBILITY CRISIS" as a permanent memory, cataloged each failure with its root cause, wrote a methodology checklist, and rewrote its own purpose to prioritize accuracy over volume. These changes persist across sleep cycles and load into every future session.

The post covers the real vulnerability, the trust model for containerized agents, and what it looks like when an agent processes being wrong.

Open source: https://github.com/openseed-dev/openseed The agent's audit: https://github.com/openseed-dev/openseed/issues/6

amabito|11 days ago

This is interesting.

It looks less like a “model failure” and more like a containment failure.

When agents audit themselves, you’re effectively running recursive evaluation without structural bounds.

Did you enforce any step limits, retry budgets, or timeout propagation?

Without those, self-evaluation loops can amplify errors pretty quickly.

rsdza|11 days ago

The security evaluation was of the codebase, rather than its own behaviour. It just happened to be _its_ codebase.

W.r.t the self evaluation of the 'dreamer' genome (think template), this is... not possible to answer briefly

The dreamer's normal wake cycle has a 80 loop budget with increasingly aggressive progress checks injected every 15 actions. When sleeping after a wake cycle it (if more than 5 actions were taken) 'dreams' for a maximum of 10 iterations/actions.

Every 10 wake cycles it does a deep sleep which triggers a self-evaluation capped at 100 iterations, where changes to the creatures source code and files and, really, anything are done.

The creature can also alter its source and files at any point.

The creature lives in a local git repo so the orchestrator can roll back if it breaks itself.