top | item 44503628

(no title)

I broadly agree that "MCP-level" patches alone won't eliminate prompt-injection risk. Latest research also shows we can make real progress by enforcing security above the MCP layer, exactly as you suggest [1]. DeepMind's CaMeL architecture is a good reference model: it surrounds the LLM with a capability-based "sandbox" that (1) tracks the provenance of every value, and (2) blocks any tool call whose arguments originate from untrusted data, unless an explicit policy grants permission.

[1] https://arxiv.org/pdf/2503.18813

discuss

tatersolid|7 months ago

> unless an explicit policy grants permission

Three months later, all devs have “Allow *” in their tool-name.conf