top | item 47104167

(no title)

The computer nerds understand how to isolate this stuff to mitigate the risk. I’m not in on openclaw just yet but I do know it’s got isolation options to run in a vm. I’m curious to see how they handle controls on “write” operations to everyday life.

I could see something like having a very isolated process that can, for example, send email, which the claw can invoke, but the isolated process has sanity controls such as human intervention or whitelists. And this isolated process could be LLM-driven also (so it could make more sophisticated decisions about “is this ok”) but never exposed to untrusted input.

discuss

yencabulator|4 days ago

> computer nerds understand

No, literally no one understands how to solve this. The only option that actually works is to isolate it to a degree that removes the "clawness" from it, and that's the opposite of what people are doing with these things.

Specifically, you cannot guard an LLM with another LLM.

The only thing I've seen with any realism to it is the variables, capabilities and taint tracking in CaMeL, but again that limits what the system can do and requires elaborate configuration. And you can't trust a tainted LLM to configure itself.

https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

https://simonwillison.net/2025/Jun/13/prompt-injection-desig...

https://simonwillison.net/2025/Apr/11/camel/

hamburglar|4 days ago

If the “clawness” means you only use the llm to control itself, then yes, that’s impossible. But you can easily shim such a process so that the interfaces it uses to “claw out” to the real world are shims that have safeties such as human control. Openclaw does not do this, and is thus a scary shit show, but you can play with it in isolation safely, and I think a standard pattern for good control will emerge.

PantaloonFlames|7 days ago

I don’t understand how “running it in a vm” Or a docker image, prevents the majority of problems. It’s an agent interacting with your bank, your calendar, your email, your home security system, and every subscription you have - DoorDash, Spotify, Netflix, etc. maybe your BTC wallet.

What protection is offered by running it in a docker container? Ok, It won’t overwrite local files. Is that the major concern?

hamburglar|6 days ago

Read my second paragraph.

It’s a matter of giving the system shims instead of direct access to “write” ops. Those shims have controls in place. Their only job is to examine the context and decide whether the (email|purchase|etx) is acceptable, either by static rules, human intervention, or, if you’re really getting spicy. separate-llm-model-that-isn’t-polluted-by-untrusted-data.

Edit: I actually wrote such a thing over the weekend as a toy PoC. It uses the LLM to generate a list of proposed operations, then you use a separate tool to iterate though them and approve/reject/skip each one. The only thing the LLM can do is suggest things from a modest set of capabilities with a fairly locked-down schema. Even if I were to automate the approvals, it’s far from able to run amok.