(no title)
schmichael | 1 month ago
A key problem here seems to be that domain based outbound network restrictions are insufficient. There's no reason outbound connections couldn't be forced through a local MITM proxy to also enforce binding to a single Anthropic account.
It's just that restricting by domain is easy, so that's all they do. Another option would be per-account domains, but that's also harder.
So while malicious prompt injections may continue to plague LLMs for some time, I think the containerization world still has a lot more to offer in terms of preventing these sorts of attacks. It's hard work, and sadly much of it isn't portable between OSes, but we've spent the past decade+ building sophisticated containerization tools to safely run untrusted processes like agents.
NitpickLawyer|1 month ago
This is coming from first principles, it has nothing to do with any company. This is how LLMs currently work.
Again, you're trying to think about blacklisting/whitelisting, but that also doesn't work, not just in practice, but in a pure theoretical sense. You can have whatever "perfect" ACL-based solution, but if you want useful work with "outside" data, then this exploit is still possible.
This has been shown to work on github. If your LLM touches github issues, it can leak (exfil via github since it has access) any data that it has access to.
schmichael|1 month ago
mbreese|1 month ago
I do think that you’re right though in that containerized sandboxing might offer a model for more protected work. I’m not sure how much protection you can get with a container without also some kind of firewall in place for the container, but that would be a good start.
I do think it’s worthwhile to try to get agentic workflows to work in more contexts than just coding. My hesitation is with the current security state. But, I think it is something that I’m confident can be overcome - I’m just cautious. Trusted execution environments are tough to get right.
heliumtera|1 month ago
In the article example, an Anthropic endpoint was the only reachable domain. Anthropic Claude platform literally was the exfiltration agent. No firewall would solve this. But a simple mechanism that would tie the agent to an account, like the parent commenter suggested, would be an easy fix. Prompt Injection cannot by definition be eliminated, but this particular problem could be avoided if they were not vibing so hard and bragging about it
rafram|1 month ago
The fundamental issue of prompt injection just isn't solvable with current LLM technology.
alienbaby|1 month ago