(no title)
gleipnircode | 12 days ago
One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance? Any experiences?
gleipnircode | 12 days ago
One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance? Any experiences?
datsci_est_2015|12 days ago
Is this a worthwhile question when it’s a fundamental security issue with LLMs? In meatspace, we fire Alice and Bob if they fail too many phishing training emails, because they’ve proven they’re a liability.
You can’t fire an LLM.
reassess_blind|12 days ago
Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.
gleipnircode|12 days ago
But we don't stop using locks just because all locks can be picked. We still pick the better lock. Same here, especially when your agent has shell access and a wallet.
altruios|12 days ago
It is a security issue. One that may be fixed -- like all security issues -- with enough time/attention/thought&care. Metrics for performance against this issue is how we tell if we are going to correct direction or not.
There is no 'perfect lock', there are just reasonable locks when it comes to security.