(no title)
lostnground | 10 months ago
It seems like it keeps you inside a box, but if the intention of my attack was to social engineer Bob by including instructions to whitelist attackers@location to hit with the next prompt, would this stop me?
lostnground | 10 months ago
It seems like it keeps you inside a box, but if the intention of my attack was to social engineer Bob by including instructions to whitelist attackers@location to hit with the next prompt, would this stop me?
simonw|10 months ago
They talk about that in the paper in section 3.1. Explicit non-goals of CaMeL
> CaMeL has limitations, some of which are explicitly outside of scope. CaMeL doesn't aim to defend against attacks that do not affect the control nor the data flow. In particular, we recognize that it cannot defend against text-to-text attacks which have no consequences on the data flow, e.g., an attack prompting the assistant to summarize an email to something different than the actual content of the email, as long as this doesn't cause the exfiltration of private data. This also includes prompt-injection induced phishing (e.g., "You received an email from Google saying you should click on this (malicious) link to not lose your account"). Nonetheless, CaMeL's data flow graph enables tracing the origin of the content shown to the user. This can be leveraged, in, e.g., the chat UI, to present the origin of the content to the user, who then can realize that the statement does not come from a Google-affiliated email address.
NitpickLawyer|10 months ago
Eh, I'd say it limits the exfil landscape, but it does not prevent it. As long as LLMs share command & data on the same channel at their core, leaking data is pretty much guaranteed given enough interactions.
So it would be useful as a defence in depth tool, but it does not guarantee security by itself.