top | item 35925337

(no title)

alex_sf | 2 years ago

Instruction tuning is distinct from RLHF. Instruction tuning teaches the model to understand and respond (in a sensible way) to instructions, versus 'just' completing text.

RLHF trains a model to adjust it's output based on a reward model. The reward model is trained from human feedback.

You can have an instruction tuned model with no RLHF, RLHF with no instruction tuning, or instruction tuning and RLHF. Totally orthogonal.

discuss

order

stevenhuang|2 years ago

In this case Open AI used RLHF to instruct-tune gpt3. Your pedantism here is unnecessary.

hyperbovine|2 years ago

Not to be pedantic, but it’s “pedantry”.

alex_sf|2 years ago

It's not being pedantic. RLHF and instruction tuning are completely different things. Painting with watercolors does not make water paint.

Nearly all popular local models are instruction tuned, but are not RLHF'd. The OAI GPT series are not the only LLMs in the world.