top | item 43326545

(no title)

ziaowang | 11 months ago

This understanding is incomplete in my opinion. LLMs are more than emulating observed behavior. In the pre-training phase tasks like masked language model indeed train the model to mimic what they read (which of course contains lots of bias); but in the RLHF phase, the model tries to generate the best response judged by human evaluations (who tries to eliminate as much bias as possible in the process). In other words, they are trained to meet human expectations in this later phase.

But human expectations are also not bias-free (e.g. from the preferring-the-first-choice phenomenon)

discuss

Xelynega|11 months ago

I don't understand what you are saying.

How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?

ziaowang|11 months ago

Texts in the wild used during pre-training contain lots of biases, such as racial and sexual biases, which are picked-up by the model.

During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.