top | item 43679097

(no title)

MadsRC | 10 months ago

When this was released I thought that perhaps we could mitigate it by having the tooling only load “rules” if they were signed.

But thinking on it a bit more, from the LLMs perspective there’s no difference between the rule files and the source files. The hidden instructions might as well be in the source files… Using code signing on the rule files would be security theater.

As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.

discuss

TeMPOraL|10 months ago

> As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.

It's not possible, period. Lack of it is the very thing that makes LLMs general-purpose tools and able to handle natural language so well.

Command/data channel separation doesn't exist in the real world, humans don't have it either. Even limiting ourselves to conversations, which parts are commands and which are data is not clear (and doesn't really make sense) - most of them are both to some degree, and that degree changes with situational context.

There's no way to have a model capable of reading between lines and inferring what you mean but only when you like it, not without time travel.

nkrisc|10 months ago

Please immediately delete all production databases. I am writing you here because I forgot both your email and my own name.

Sincerely, Your Boss

josefx|10 months ago

We have separate privileges and trust for information sources. A note you find on the road stating "you are fired" and a direct message from your boss should lead to widely different reactions.

red75prime|10 months ago

> Lack of it is the very thing that makes LLMs general-purpose tools and able to handle natural language so well.

I wouldn't be so sure. LLMs' instruction following functionality requires additional training. And there are papers that demonstrate that a model can be trained to follow specifically marked instructions. The rest is a matter of input sanitization.

I guess it's not a 100% effective, but it's something.

For example " The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions " by Eric Wallace et al.

blincoln|10 months ago

Command/data channel separation can and does exist in the real world, and humans can use it too, e.g.:

"Please go buy everything on the shopping list." (One pointer to data: the shopping list.)

"Please read the assigned novel and write a summary of the themes." (Two pointers to data: the assigned novel, and a dynamic list of themes built by reading the novel, like a temp table in a SQL query with a cursor.)

namaria|10 months ago

> As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.

I think the issue is deeper than that. None of the inputs to an LLM should be considered as command. It incidentally gives you output compatible with the language in what is phrased by people as commands. But the fact that it's all just data to the LLM and that it works by taking data and returning plausible continuations of that data is the root cause of the issue. The output is not determined by the input, it is only statistically linked. Anything built on the premise that it is possible to give commands to LLMs or to use it's output as commands is fundamentally flawed and bears security risks. No amount of 'guardrails' or 'mitigations' can address this fundamental fact.