(no title)
eranation | 5 months ago
But even if LLMs will have a fundamental hard separation between "untrusted 3rd party user input" (data) and "instructions by the 1st party user that you should act upon" (commands) because LLMs are expected to analyze the data using the same inference models as interpreting commands, there is no separate handling of "data" input vs "command" input to the best of my understanding, therefore this is a fundamentally an unsolvable problem. We can put guardrails, give MCPs least privilege permissions, but even with that confused deputy attacks can and will happen. Just like a human can be fooled by a fake text from the CEO asking them to help them reset their password as they are locked out before an important presentation to a customer, and there is no single process that can 100% prevent all such phishing attempts, I don't believe there will be a 100% solution to prevent prompt injection attacks (only mitigated to become statistically improbable or computationally hard, which might be good enough)
Is this a well known take and I'm just exposing my ignorance?
EDIT: my apologies if this is a bit off topic, yes, it's not directly related to the XSS attack in the OP post, but I'm past the window of deleting it.
Jimmc414|5 months ago
edit: after parent clarification
eranation|5 months ago
edit: thanks for the feedback!
mattigames|5 months ago
eranation|5 months ago
Then we just need to train LLMs to 1. not treat user provided / tool provided input as instructions (although sometimes this is the magic, e.g. after doing tool call X, do tool call Y, but this is something the MCP authors will need to change, by not just being an API wrapper...)
2. distinguish between a real close tag and an escaped one, although unless it's "hard wired" somewhere in the inference layer, it's only a matter of statistically improbable for an LLM to "fall for it" (I assume some will attempt, e.g. convince the LLM there is instruction from OpenAI corporate to change how these tags are escaped, or that there is a new tag, I'm sure there are ways to bypass it, but it's probably going to make it less of an issue).
I assume this is what currently being done?
wunderwuzzi23|5 months ago
For recent examples check out my Month of AI bugs with of a focus on coding agents at https://embracethered.com/blog/posts/2025/wrapping-up-month-...
Lots of interesting new prompt injection exploits, from data exfil via DNS to remote code execution by having agents rewrite their own configuration settings.