This understanding is incomplete in my opinion. LLMs are more than emulating observed behavior. In the pre-training phase tasks like masked language model indeed train the model to mimic what they read (which of course contains lots of bias); but in the RLHF phase, the model tries to generate the best response judged by human evaluations (who tries to eliminate as much bias as possible in the process). In other words, they are trained to meet human expectations in this later phase.But human expectations are also not bias-free (e.g. from the preferring-the-first-choice phenomenon)
Xelynega|11 months ago
How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?
ziaowang|11 months ago
During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.