(no title)
hamburglar | 8 days ago
I could see something like having a very isolated process that can, for example, send email, which the claw can invoke, but the isolated process has sanity controls such as human intervention or whitelists. And this isolated process could be LLM-driven also (so it could make more sophisticated decisions about “is this ok”) but never exposed to untrusted input.
yencabulator|4 days ago
No, literally no one understands how to solve this. The only option that actually works is to isolate it to a degree that removes the "clawness" from it, and that's the opposite of what people are doing with these things.
Specifically, you cannot guard an LLM with another LLM.
The only thing I've seen with any realism to it is the variables, capabilities and taint tracking in CaMeL, but again that limits what the system can do and requires elaborate configuration. And you can't trust a tainted LLM to configure itself.
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
https://simonwillison.net/2025/Jun/13/prompt-injection-desig...
https://simonwillison.net/2025/Apr/11/camel/
hamburglar|4 days ago
PantaloonFlames|7 days ago
What protection is offered by running it in a docker container? Ok, It won’t overwrite local files. Is that the major concern?
hamburglar|6 days ago
It’s a matter of giving the system shims instead of direct access to “write” ops. Those shims have controls in place. Their only job is to examine the context and decide whether the (email|purchase|etx) is acceptable, either by static rules, human intervention, or, if you’re really getting spicy. separate-llm-model-that-isn’t-polluted-by-untrusted-data.
Edit: I actually wrote such a thing over the weekend as a toy PoC. It uses the LLM to generate a list of proposed operations, then you use a separate tool to iterate though them and approve/reject/skip each one. The only thing the LLM can do is suggest things from a modest set of capabilities with a fairly locked-down schema. Even if I were to automate the approvals, it’s far from able to run amok.