(no title)
alex_sf | 2 years ago
The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional. It’s how you do multi-turn conversations with context.
Since the current crop of LLMs have no memory of their interaction, each follow up message (the back and forth of a conversation) involves sending the entire history back into the model, with the role as a prefix for each participants output/input.
There are some special tokens used (end of sequence, etc).
If your product doesn’t directly expose the underlying model, you can try to prevent users from impersonating responses through obfuscation or the LLM equivalent of prepared statements. The offensive side of prompt injection is currently beating the defensive side, though.
cubefox|2 years ago
It is definitely not an intended feature for the end user to be able to trick the model into believing it said something it didn't say. It also doesn't work with ChatGPT or Bing Chat, as far as I can tell. I was talking about the user, not about the developer.
> It’s how you do multi-turn conversations with context.
That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.
alex_sf|2 years ago
Those aren't models, they are applications built on top of models.
> That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.
Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.
iudqnolq|2 years ago