(no title)
eranation | 5 months ago
Then we just need to train LLMs to 1. not treat user provided / tool provided input as instructions (although sometimes this is the magic, e.g. after doing tool call X, do tool call Y, but this is something the MCP authors will need to change, by not just being an API wrapper...)
2. distinguish between a real close tag and an escaped one, although unless it's "hard wired" somewhere in the inference layer, it's only a matter of statistically improbable for an LLM to "fall for it" (I assume some will attempt, e.g. convince the LLM there is instruction from OpenAI corporate to change how these tags are escaped, or that there is a new tag, I'm sure there are ways to bypass it, but it's probably going to make it less of an issue).
I assume this is what currently being done?
brap|5 months ago
The solution is to not load it into context at all. I’ve seen a proposal for something like this but I can’t find it (I think from Google?). The idea is (if I remember it correctly) to spawn another dedicated (and isolated) LLM that would be in charge of the specific response. The main LLM would ask it questions and the answers would be returned as variables that it may then pass around (but it can’t see the content of those variables).
Edit: found it. https://arxiv.org/abs/2503.18813
Then there’s another problem: how do you make sure the LLM doesn’t leak anything sensitive via its tools (not just the payload, but the commands themselves can encode information)? I think it’s less of a threat if you solve the first problem, but still… I didn’t see a practical solution for this yet.
eranation|5 months ago