top | item 46847457

(no title)

Majromax | 28 days ago

What is your threat model?

If you're worried about a hostile agent, then indeed sandboxing is not enough. In the worst case, an actively malicious agent could even try to escape the sandbox with whatever limited subset of commands it's given.

If you're worried about prompt injection, then restricting access to unfiltered content is enough. That would definitely involve not processing third-party input and removing internet search tools, but the restriction probably doesn't have to be mechanically complete if the agent has also been instructed to use local resources only. Even package installation (uv, npm, etc) would be fine up to the existing risk of supply-chain attacks.

If you're worried about stochastic incompetence (e.g. the agent nukes the production database to fix a misspelled table name), then a sandbox to limit the 'blast radius' of any damage is plenty.

discuss

jdkoeck|28 days ago

That argument seems to assume a security model where the default prior is « no hostile agent ». But that’s the problem, any agent can be made hostile with a successful prompt injection attack. Basically, assuming there’s no hostile agent is the same as assuming there’s no attacker. I think we can agree a security model that assumes no attacker is insufficient.