(no title)
schmichael | 1 month ago
Your comparison is useful but wrong. I was online in 99 and the 00s when SQL injection was common, and we were telling people to stop using string interpolation for SQL! Parameterized SQL was right there!
We have all of the tools to prevent these agentic security vulnerabilities, but just like with SQL injection too many people just don't care. There's a race on, and security always loses when there's a race.
The greatest irony is that this time the race was started by the one organization expressly founded with security/alignment/openness in mind, OpenAI, who immediately gave up their mission in favor of power and money.
bcrosby95|1 month ago
Do we really? My understanding is you can "parameterize" your agentic tools but ultimately it's all in the prompt as a giant blob and there is nothing guaranteeing the LLM won't interpret that as part of the instructions or whatever.
The problem isn't the agents, its the underlying technology. But I've no clue if anyone is working on that problem, it seems fundamentally difficult given what it does.
stavros|1 month ago
alienbaby|1 month ago
lkjdsklf|1 month ago
The entire point of many of these features is to get data into the prompt. Prompt injection isn't a security flaw. It's literally what the feature is designed to do.
dehugger|1 month ago
This is what I do, and I am 100% confident that Claude cannot drop my database or truncate a table, or read from sensitive tables. I know this because the tool it uses to interface with the database doesn't have those capabilities, thus Claude doesn't have that capability.
It won't save you from Claude maliciously ex-filtrating data it has access to via DNS or some other side channel, but it will protect from worst-case scenarios.
narrator|1 month ago
formerly_proven|1 month ago
For use cases where you can't have a boundary around the LLM, you just can't use an LLM and achieve decent safety. At least until someone figures out bit coloring, but given the architecture of LLMs I have very little to no faith that this will happen.
NitpickLawyer|1 month ago
We absolutely do not have that. The main issue is that we are using the same channel for both data and control. Until we can separate those with a hard boundary, we do not have tools to solve this. We can find mitigations (that camel library/paper, various back and forth between models, train guardrail models, etc) but it will never be "solved".
schmichael|1 month ago
A key problem here seems to be that domain based outbound network restrictions are insufficient. There's no reason outbound connections couldn't be forced through a local MITM proxy to also enforce binding to a single Anthropic account.
It's just that restricting by domain is easy, so that's all they do. Another option would be per-account domains, but that's also harder.
So while malicious prompt injections may continue to plague LLMs for some time, I think the containerization world still has a lot more to offer in terms of preventing these sorts of attacks. It's hard work, and sadly much of it isn't portable between OSes, but we've spent the past decade+ building sophisticated containerization tools to safely run untrusted processes like agents.
girvo|1 month ago
I don't think we do? Not generally, not at scale. The best we can do is capabilities/permissions but that relies on the end-user getting it perfectly right, which we already know is a fools errand in security...
groby_b|1 month ago
We do? What is the tool to prevent prompt injection?
alienbaby|1 month ago
lacunary|1 month ago
losthobbies|1 month ago
Terr_|1 month ago
That difference just makes the current situation even dumber, in terms of people building in castles on quicksand and hoping they can magically fix the architectural problems later.
> We have all the tools to prevent these agentic security vulnerabilities
We really don't, not in the same way that parameterized queries prevented SQL injection. There is LLM equivalent for that today, and nobody's figured out how to have it.
Instead, the secure alternative is "don't even use an LLM for this part".
jxcole|1 month ago
hakanderyal|1 month ago
And, Solving this vulnerabilities requires human intervention at this point, along with great tooling. Even if the second part exists, first part will continue to be a problem. Either you need to prevent external input, or need to manually approve outside connection. This is not something that I expect people that Claude Cowork targets to do without any errors.
nebezb|1 month ago
How?
antonvs|1 month ago