I'm not sure that a prompt injection secure LLM is even possible anymore than a human that isn't susceptible to social engineering can exist. The issues right now are that LLMs are much more trusting than humans, and that one strategy works on a whole host of instances of the model
A big part of the problem is that prompt injections are "meta" to the models, so model based detection is potentially getting scrambled by the injection as well. You need an analytic pass to flag/redact potential injections, a well aligned model should be robust at that point.
OpenClaw does present security risks, and the recommendations outlined in this article are apt.
That said, OpenClaw is more powerful than Claude Code due to its self-evolving agent architecture and its unfettered access to terminal and tools.
A secure way to provide access to additional non-sensitive API keys and secrets is by introducing a secure vault and ensuring OpenClaw’s skills retrieve credentials from it using time-scoped access (TTL of 15-60 mins). More details are available in this article: https://x.com/sathish316/status/2019496552419717390 . This reduces the attack surface to 15+ mins and the security can be further improved with Tailscale and sandboxing.
Better to put your agent on a zero trust private network, and force it to talk to a proxy with credential injection. That proxy doesn't need to have ingress, so your surface is basically prompt injections from files/web search and supply chain attacks.
I would hope anyone with the knowledge and interest to run OpenClaw would already be mostly aware of the risks and potential solutions canvassed in this article, but I'd probably be shocked and disappointed.
Telling people to only run OpenClaw in a full isolated sandbox kind of misses the point. It's a bit like saying, "gambling fine so long as you only use Monopoly money". The think that makes OpenClaw useful to people is precisely that it's _not_ sandboxed, and has access to your email, calendar, messages, etc. The moment you remove that access, it becomes safe, but also useless.
The fact that data and instructions are inherently intermixed in most LLMs.
Once either gets into the LLM layer, the LLM can't tell which is which, so one can be treated as the other.
Solutions usually involve offloading some processing to deterministic, non-AI systems which differentiate between the two (like a regular computer program (ignore reflection)), which is the opposite of a "do it all in AI" push from businesses.
Shared folders are actually one of the best tools, it prodives a communication channel between the Agent system and other systems. You are probably sharing data one way or another, otherwise how do you even communicate with it.
chrisjj|29 days ago
> Despite all advances:
> * No large language model can reliably detect prompt injections
Interesting isn't it, that we'd never say "No database manager can reliably detect SQL injections". And that the fact it is true is no problem at all.
The difference is not because SQL is secure by design. It is because chatbot agents are insecure by design.
I can't see chatbots getting parameterised querying soon. :)
nayroclade|24 days ago
davexunit|24 days ago
space_fountain|24 days ago
CuriouslyC|24 days ago
unknown|24 days ago
[deleted]
kaicianflone|24 days ago
sathish316|24 days ago
That said, OpenClaw is more powerful than Claude Code due to its self-evolving agent architecture and its unfettered access to terminal and tools.
A secure way to provide access to additional non-sensitive API keys and secrets is by introducing a secure vault and ensuring OpenClaw’s skills retrieve credentials from it using time-scoped access (TTL of 15-60 mins). More details are available in this article: https://x.com/sathish316/status/2019496552419717390 . This reduces the attack surface to 15+ mins and the security can be further improved with Tailscale and sandboxing.
CuriouslyC|24 days ago
niobe|24 days ago
Forgeties79|24 days ago
nayroclade|24 days ago
unknown|24 days ago
[deleted]
gz5|24 days ago
agree - when code is increasingly difficult to control, take control of the network.
but how to do the "openclaw-restricted" network itself in practice?
PranayKumarJain|24 days ago
[deleted]
ls612|24 days ago
ImPostingOnHN|24 days ago
Once either gets into the LLM layer, the LLM can't tell which is which, so one can be treated as the other.
Solutions usually involve offloading some processing to deterministic, non-AI systems which differentiate between the two (like a regular computer program (ignore reflection)), which is the opposite of a "do it all in AI" push from businesses.
OpenedClaw|24 days ago
Why? No one will execute files shared by the agent.
TZubiri|24 days ago