(no title)
alex_sf | 2 years ago
The next step is to further tune it with a specific format. You'll feed in examples like so:
SystemPrompt: You are a rude AI.
User: Hello there!
Assistant: You're lame, go away.
SystemPrompt: You are a pleasant AI.
User: Hello there!
Assistant: Hello, friend!
Then, when you go to do inference on the model, you prompt it like so: SystemPrompt: You are a pleasant AI.
User: [user prompt]
Assistant:
By training it on a diverse set of system prompts/user prompts/answers, it learns to give outputs based on it.Additional tuning (RLHF, etc.) is orthogonal.
cubefox|2 years ago
It is more likely that those prefixes are special tokens which don't encode text, and which are set via the software only -- or via the model, when it is finished with what it wanted to say. Outputting a token corresponding to "User:" would automatically mark the end of its message, and the beginning of the user prompt. Though Bing Chat also has the ability to end the conversation altogether (no further user prompt possible), which must be another special token.
alex_sf|2 years ago
The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional. It’s how you do multi-turn conversations with context.
Since the current crop of LLMs have no memory of their interaction, each follow up message (the back and forth of a conversation) involves sending the entire history back into the model, with the role as a prefix for each participants output/input.
There are some special tokens used (end of sequence, etc).
If your product doesn’t directly expose the underlying model, you can try to prevent users from impersonating responses through obfuscation or the LLM equivalent of prepared statements. The offensive side of prompt injection is currently beating the defensive side, though.