(no title)
thepasswordapp | 2 months ago
The attack surface is interesting - the agent's "prompt" becomes a trust boundary, and anything that can influence that prompt (PR descriptions, issue comments, commit messages) becomes a potential attack vector.
I've been working on browser automation agents and the same principle applies - you have to assume any page content or user input could be adversarial. Strict separation between "what the agent can see" and "what the agent can do" is crucial.
No comments yet.