top | item 46886220

(no title)

longtermop | 26 days ago

Really appreciate the credential isolation approach here. The proxy pattern makes a lot of sense - keeping keys out of the agent's context entirely is the right call.

This got me thinking about a related trust boundary issue though: even with credentials protected, the agent can still be manipulated through its inputs. Prompt injection via tool outputs or RAG retrieval can trick an agent into calling those credentialed endpoints in unintended ways. Your calendar API key is safe, but a malicious payload in an email body could still instruct the agent to "delete all meetings" through the legitimate Wardgate-protected endpoint.

I've been working on PromptShield which tackles the input validation layer (sanitizing what comes back from tools/retrieval before it hits the model). Feels like these are complementary pieces of the same puzzle.

Curious about your threat model assumptions - are you primarily defending against credential exfiltration, or also thinking about the abuse-through-legitimate-channels vector? The access rules and logging you mention could be really powerful for the latter too (rate limiting, anomaly detection, etc).

discuss

order

avoutic|26 days ago

WardGate also tackles "deleting all meetings"-kind of attacks, at least if you choose to. So for my setup, I allow calendar reading, but updating and editing, requires an approval by me.

So you would configure this:

  endpoints:
    calendar:
      preset: google-calendar
      auth:
        credential_env: WARDGATE_CRED_GOOGLE_CALENDAR
      capabilities:
        read_data: allow
        create_events: allow
        update_events: ask
        delete_events: ask
So updating or deleting events requires human permission.

There are already time controls and rate-limiting included.

On the list for things to develop is an LLM model adapter as well, that could detect prompt injection, but also identity-masking and credential-triggering-approvals. Anomaly detection is on the todo.

The threat model is agents deliberately (because of gullibility, prompt injection, or dumb actions) leaking data and either detecting that early on or preventing such things.