top | item 46936464

(no title)

This is the confused deputy problem at the application layer. Sandboxing secures the environment, but if the agent has legitimate access to sensitive operations (email, database writes, API calls), prompt injection attacks work through approved channels. The only hard defense is explicit user confirmation for each action, which defeats the point of autonomy.

discuss

No comments yet.