(no title)
reassess_blind | 12 days ago
Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.
reassess_blind | 12 days ago
Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.
datsci_est_2015|12 days ago
It’s interesting though, because the attack can be asymmetric. You could create a honeypot website that has a state-of-the-art prompt injection, and suddenly you have all of the secrets from every LLM agent that visits.
So the incentives are actually significantly higher for a bad actor to engineer state-of-the-art prompt injection. Why only get one bank’s secrets when you could get all of the banks’ secrets?
This is in comparison to targeting Alice with your spearphishing campaign.
Edit: like I said in the other comment, though, it’s not just that you _can_ fire Alice, it’s that you let her know if she screws up one more time you will fire her, and she’ll behave more cautiously. “Build a better generative AI” is not the same thing.