top | item 43327332

(no title)

ziaowang | 11 months ago

Texts in the wild used during pre-training contain lots of biases, such as racial and sexual biases, which are picked-up by the model.

During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.

discuss

No comments yet.