(no title)
longtermop | 26 days ago
This got me thinking about a related trust boundary issue though: even with credentials protected, the agent can still be manipulated through its inputs. Prompt injection via tool outputs or RAG retrieval can trick an agent into calling those credentialed endpoints in unintended ways. Your calendar API key is safe, but a malicious payload in an email body could still instruct the agent to "delete all meetings" through the legitimate Wardgate-protected endpoint.
I've been working on PromptShield which tackles the input validation layer (sanitizing what comes back from tools/retrieval before it hits the model). Feels like these are complementary pieces of the same puzzle.
Curious about your threat model assumptions - are you primarily defending against credential exfiltration, or also thinking about the abuse-through-legitimate-channels vector? The access rules and logging you mention could be really powerful for the latter too (rate limiting, anomaly detection, etc).
avoutic|26 days ago
So you would configure this:
So updating or deleting events requires human permission.There are already time controls and rate-limiting included.
On the list for things to develop is an LLM model adapter as well, that could detect prompt injection, but also identity-masking and credential-triggering-approvals. Anomaly detection is on the todo.
The threat model is agents deliberately (because of gullibility, prompt injection, or dumb actions) leaking data and either detecting that early on or preventing such things.