Imho a combination of different layers and methods can reduce the risk (but it's not 0):
* Use frontier LLMs - they have the best detection. A good system prompt can also help a lot (most authoritative channel).
* Reduce downstream permissions and tool usage to the minimum, depending on the agentic use case (Main chat / Heartbeat / Cronjob...). Use human-in-the-loop escalation outside the LLM.
* For potentially attacker controlled content (external emails, messages, web), always use the "tool" channel / message role (not "user" or "system").
* Follow state of the art security in general (separation, permission, control...).
* Test. We are still in the discovery phase.
No comments yet.