top | item 36988090

(no title)

2bitencryption | 2 years ago

How well does it adhere to the system prompt?

The base Llama-Chat models use something called "ghost attention" (they describe it in their paper). No clue how it works, but the result is, the model sticks to the system prompt extremely well. If you tell it that it's Marvin the Paranoid Android in the system prompt, it will stick 100% to that.

In llama-derived models like Vicuna, if you tell it to act as Marvin the Paranoid Android, eventually the regular "assistant" voice starts to bleed through, after only a few chat turns.

Doesn't sound like a big deal, but in cases where you have strict rules you want the model to follow, then the base llama-2-chat models are far better than any derived ones that do not implement ghost attention.

discuss

order

loudmax|2 years ago

My understanding of ghost attention is that the interface will inject periodic reminders into the conversation. These are seen by the model but hidden from the user. They do use up some of the tokens available in the context window.

2bitencryption|2 years ago

But from the paper, it sounds like this happens only during training. Some trick about constantly re-injecting the system prompt during chat conversations.

But during inference, there's no trick. The system message remains once at the top.

htrp|2 years ago

Couldn't you replicate that by doing the same thing and prepending system prompts in the Vicuna models?

kromem|2 years ago

Ghost attention is only used in the 70B model in llama 2, FWIW.

So would need to make sure comparing apples to apples.