top | item 47199908

(no title)

shich | 1 day ago

The proxy approach for secret injection is the right mental model, but it only works if the proxy itself is hardened against prompt injection. An agent that can't access secrets directly can still be manipulated into crafting requests that leak data through side channels — URL params, timing, error messages.

The deeper issue: most of these guardrails assume the threat is accidental (agent goes off the rails) rather than adversarial (something in the agent's context is actively trying to manipulate it). Time-boxed domain whitelists help with the latter but the audit loop at session end is still reactive.

The /revert snapshot idea is underrated though. Reversibility should be the first constraint, not an afterthought.

discuss

buremba|1 day ago

> but it only works if the proxy itself is hardened against prompt injection.

Yes, I'm experimenting using a small model like Haiku to double check if the request looks good. It adds quite a bit of latency but it might be the right approach.

Honestly; it's still pretty much like early days of self driving cars. You can see the car can go without you supervising it but still you need to keep an eye on where it's going.