top | item 45005567

(no title)

alexbecker | 6 months ago

I doubt Comet was using any protections beyond some tuned instructions, but one thing I learned at USENIX Security a couple weeks ago is that nobody has any idea how to deal with prompt injection in a multi-turn/agentic setting.

discuss

order

hoppp|6 months ago

Maybe treat prompts like it was SQL strings, they need to be sanitized and preferably never exposed to external dynamic user input

Terr_|6 months ago

The LLM is basically an iterative function going guess_next_text(entire_document). There is no algorithm-level distinction at all between "system prompt" or "user prompt" or user input... or even between its own prior output. Everything is concatenated into one big equally-untrustworthy stream.

I suspect a lot of techies operate with a subconscious good-faith assumption: "That can't be how X works, nobody would ever built it that way, that would be insecure and naive and error-prone, surely those bajillions of dollars went into a much better architecture."

Alas, when it comes to day's the AI craze, the answer is typically: "Nope, the situation really is that dumb."

__________

P.S.: I would also like to emphasize that even if we somehow color-coded or delineated all text based on origin, that's nowhere close to securing the system. An attacker doesn't need to type $EVIL themselves, they just need to trick the generator into mentioning $EVIL.

prisenco|6 months ago

Sanitizing free-form inputs in a natural language is a logistical nightmare, so it's likely there isn't any safe way to do that.

alexbecker|6 months ago

The problem is there is no real way to separate "data" and "instructions" in LLMs like there is for SQL

internet_points|6 months ago

SQL strings can be reliably escaped by well-known mechanical procedures.

There is no generally safe way of escaping LLM input, all you can do is pray, cajole, threaten or hope.

chasd00|6 months ago

Can’t the connections and APIs that an LLM are given to answer queries be authenticated/authorized by the user entering the query? Then the LLM can’t do anything the asking user can’t do at least. Unless you have launch the icbm permissions yourself there’s no way to get the LLM to actually launch the icbm.

lelanthran|6 months ago

You cannot sanitize prompt strings.

This is not SQL.