top | item 47050498

(no title)

OpenClaw user here. Genuinely curious to see if this works and how easy it turns out to be in practice.

One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance? Any experiences?

discuss

datsci_est_2015|12 days ago

> One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance?

Is this a worthwhile question when it’s a fundamental security issue with LLMs? In meatspace, we fire Alice and Bob if they fail too many phishing training emails, because they’ve proven they’re a liability.

You can’t fire an LLM.

reassess_blind|12 days ago

Yes, it’s worthwhile because the new models are being specifically trained and hardened against prompt injection attacks.

Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.

gleipnircode|12 days ago

It's a fundamental issue I agree.

But we don't stop using locks just because all locks can be picked. We still pick the better lock. Same here, especially when your agent has shell access and a wallet.

altruios|12 days ago

with openclaw... you CAN fire an LLM. just replace it with another model, or soul.md/idenity.md.

It is a security issue. One that may be fixed -- like all security issues -- with enough time/attention/thought&care. Metrics for performance against this issue is how we tell if we are going to correct direction or not.

There is no 'perfect lock', there are just reasonable locks when it comes to security.