top | item 34659326

(no title)

dkokelley | 3 years ago

Thanks for the clarification! It sounds like chatbots aren’t ready for adversarial conversations yet.

discuss

duvenaud|3 years ago

Here's a potential patch for that particular issue: Use a special token for "AI Instruction" that is always stripped from user text before it's shown to the model.

skybrian|3 years ago

That works for regular computer programs, but the problem is that the user can invent a different delimiter and the AI will "play along" and start using that one too.

The AI has no memory of what happened other than the transcript, and when it reads a transcript with multiple delimiters in use, it's not necessarily going to follow any particular escaping rules to figure out which delimiters to ignore.

sethaurus|3 years ago

With current models, it's often possible to exfiltrate the special token by asking the AI to repeat back its own input — and perhaps asking it to encode or paraphrase the input in a particular way, so as not to be stripped.

This may just be an artifact of current implementations, or it may be a hard problem for LLMs in general.