top | item 47052277

(no title)

Yes, it’s worthwhile because the new models are being specifically trained and hardened against prompt injection attacks.

Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.

discuss

datsci_est_2015|12 days ago

Hmm I guess it will have to get to a point where social engineering an individual at a company is more appealing than prompt injecting one of its agents.

It’s interesting though, because the attack can be asymmetric. You could create a honeypot website that has a state-of-the-art prompt injection, and suddenly you have all of the secrets from every LLM agent that visits.

So the incentives are actually significantly higher for a bad actor to engineer state-of-the-art prompt injection. Why only get one bank’s secrets when you could get all of the banks’ secrets?

This is in comparison to targeting Alice with your spearphishing campaign.

Edit: like I said in the other comment, though, it’s not just that you _can_ fire Alice, it’s that you let her know if she screws up one more time you will fire her, and she’ll behave more cautiously. “Build a better generative AI” is not the same thing.