top | item 47153258

(no title)

> computer nerds understand

No, literally no one understands how to solve this. The only option that actually works is to isolate it to a degree that removes the "clawness" from it, and that's the opposite of what people are doing with these things.

Specifically, you cannot guard an LLM with another LLM.

The only thing I've seen with any realism to it is the variables, capabilities and taint tracking in CaMeL, but again that limits what the system can do and requires elaborate configuration. And you can't trust a tainted LLM to configure itself.

https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

https://simonwillison.net/2025/Jun/13/prompt-injection-desig...

https://simonwillison.net/2025/Apr/11/camel/

discuss

hamburglar|4 days ago

If the “clawness” means you only use the llm to control itself, then yes, that’s impossible. But you can easily shim such a process so that the interfaces it uses to “claw out” to the real world are shims that have safeties such as human control. Openclaw does not do this, and is thus a scary shit show, but you can play with it in isolation safely, and I think a standard pattern for good control will emerge.

yencabulator|4 days ago

> easily

Yeah that's an active research topic for teams of PhDs, including some of Google's brightest. And the current approach even with added barriers may just be fundamentally untrustable. Read the links from my earlier comment for background.