(no title)
2bitencryption | 2 years ago
The base Llama-Chat models use something called "ghost attention" (they describe it in their paper). No clue how it works, but the result is, the model sticks to the system prompt extremely well. If you tell it that it's Marvin the Paranoid Android in the system prompt, it will stick 100% to that.
In llama-derived models like Vicuna, if you tell it to act as Marvin the Paranoid Android, eventually the regular "assistant" voice starts to bleed through, after only a few chat turns.
Doesn't sound like a big deal, but in cases where you have strict rules you want the model to follow, then the base llama-2-chat models are far better than any derived ones that do not implement ghost attention.
loudmax|2 years ago
2bitencryption|2 years ago
But during inference, there's no trick. The system message remains once at the top.
htrp|2 years ago
kromem|2 years ago
So would need to make sure comparing apples to apples.