(no title)
alex_sf | 2 years ago
RLHF trains a model to adjust it's output based on a reward model. The reward model is trained from human feedback.
You can have an instruction tuned model with no RLHF, RLHF with no instruction tuning, or instruction tuning and RLHF. Totally orthogonal.
stevenhuang|2 years ago
hyperbovine|2 years ago
alex_sf|2 years ago
Nearly all popular local models are instruction tuned, but are not RLHF'd. The OAI GPT series are not the only LLMs in the world.