top | item 46623621

(no title)

schmichael | 1 month ago

> We TOLD you this dynamic web stuff was a mistake. Static HTML never had injection attacks.

Your comparison is useful but wrong. I was online in 99 and the 00s when SQL injection was common, and we were telling people to stop using string interpolation for SQL! Parameterized SQL was right there!

We have all of the tools to prevent these agentic security vulnerabilities, but just like with SQL injection too many people just don't care. There's a race on, and security always loses when there's a race.

The greatest irony is that this time the race was started by the one organization expressly founded with security/alignment/openness in mind, OpenAI, who immediately gave up their mission in favor of power and money.

discuss

order

bcrosby95|1 month ago

> We have all of the tools to prevent these agentic security vulnerabilities,

Do we really? My understanding is you can "parameterize" your agentic tools but ultimately it's all in the prompt as a giant blob and there is nothing guaranteeing the LLM won't interpret that as part of the instructions or whatever.

The problem isn't the agents, its the underlying technology. But I've no clue if anyone is working on that problem, it seems fundamentally difficult given what it does.

stavros|1 month ago

We don't. The interface to the LLM is tokens, there's nothing telling the LLM that some tokens are "trusted" and should be followed, and some are "untrusted" and can only be quoted/mentioned/whatever but not obeyed.

alienbaby|1 month ago

The control and data streams are woven together (context is all just one big prompt) and there is currently no way to tell for certain which is which.

lkjdsklf|1 month ago

yeah I'm not convinced at all this is solvable.

The entire point of many of these features is to get data into the prompt. Prompt injection isn't a security flaw. It's literally what the feature is designed to do.

dehugger|1 month ago

Write your own tools. Dont use something off the shelf. If you want it to read from a database, create a db connector that exposes only the capabilities you want it to have.

This is what I do, and I am 100% confident that Claude cannot drop my database or truncate a table, or read from sensitive tables. I know this because the tool it uses to interface with the database doesn't have those capabilities, thus Claude doesn't have that capability.

It won't save you from Claude maliciously ex-filtrating data it has access to via DNS or some other side channel, but it will protect from worst-case scenarios.

narrator|1 month ago

I think what we have to do is making each piece of context have a permission level. That context that contains our AWS key is not permitted to be used when calling evil.com webservices. Claude will look at all the permissions used to create the current context and it's about to call evil.com and it will say whoops, can't call evil.com, let me regenerate the context from any context I have that is ok to call evil.com with like the text of a wikipedia article or something like that.

formerly_proven|1 month ago

For coding agents you simply drop them into a container or VM and give them a separate worktree. You review and commit from the host. Running agents as your main account or as an IDE plugin is completely bonkers and wholly unreasonable. Only give it the capabilities which you want it to use. Obviously, don't give it the likely enormous stack of capabilities tied to the ambient authority of your personal user ID or ~/.ssh

For use cases where you can't have a boundary around the LLM, you just can't use an LLM and achieve decent safety. At least until someone figures out bit coloring, but given the architecture of LLMs I have very little to no faith that this will happen.

NitpickLawyer|1 month ago

> We have all of the tools to prevent these agentic security vulnerabilities

We absolutely do not have that. The main issue is that we are using the same channel for both data and control. Until we can separate those with a hard boundary, we do not have tools to solve this. We can find mitigations (that camel library/paper, various back and forth between models, train guardrail models, etc) but it will never be "solved".

schmichael|1 month ago

I'm unconvinced we're as powerless as LLM companies want you to believe.

A key problem here seems to be that domain based outbound network restrictions are insufficient. There's no reason outbound connections couldn't be forced through a local MITM proxy to also enforce binding to a single Anthropic account.

It's just that restricting by domain is easy, so that's all they do. Another option would be per-account domains, but that's also harder.

So while malicious prompt injections may continue to plague LLMs for some time, I think the containerization world still has a lot more to offer in terms of preventing these sorts of attacks. It's hard work, and sadly much of it isn't portable between OSes, but we've spent the past decade+ building sophisticated containerization tools to safely run untrusted processes like agents.

girvo|1 month ago

> We have all of the tools to prevent these agentic security vulnerabilities

I don't think we do? Not generally, not at scale. The best we can do is capabilities/permissions but that relies on the end-user getting it perfectly right, which we already know is a fools errand in security...

groby_b|1 month ago

> We have all of the tools to prevent these agentic security vulnerabilities,

We do? What is the tool to prevent prompt injection?

alienbaby|1 month ago

The best I've heard is rewriting prompts as summaries before forwarding them to the underlying ai, but has it's own obvious shortcomings, and it's still possible. If harder. To get injection to work

lacunary|1 month ago

more AI - 60% of the time an additional layer of AI works every time

losthobbies|1 month ago

Sanitise input and LLM output.

Terr_|1 month ago

> Parameterized SQL was right there!

That difference just makes the current situation even dumber, in terms of people building in castles on quicksand and hoping they can magically fix the architectural problems later.

> We have all the tools to prevent these agentic security vulnerabilities

We really don't, not in the same way that parameterized queries prevented SQL injection. There is LLM equivalent for that today, and nobody's figured out how to have it.

Instead, the secure alternative is "don't even use an LLM for this part".

jxcole|1 month ago

A better analogy would be to compare it to being able to install anything from online vs only installing from an app store. If you wouldn't trust an exe from bad adhacker.com you probably shouldn't trust a skill from there either.

hakanderyal|1 month ago

You are describing the HN that I want it to be. Current comments here demonstrates my version sadly.

And, Solving this vulnerabilities requires human intervention at this point, along with great tooling. Even if the second part exists, first part will continue to be a problem. Either you need to prevent external input, or need to manually approve outside connection. This is not something that I expect people that Claude Cowork targets to do without any errors.

nebezb|1 month ago

> We have all of the tools to prevent these agentic security vulnerabilities

How?

antonvs|1 month ago

You just have to find a way to enter schmichael's vivid imagination.