top | item 44041521

(no title)

abc-1 | 9 months ago

Not surprising. They’re almost assuredly trained on reddit data. We should probably call this “the reddit simp bias”.

discuss

To be honest, I am not sure where this bias comes from. It might be in the Web data, but it might also be overcorrection of the alignment tuning. They LLM providers are worried that their models will generate sexist or racists remarks so they tune it to be really sensitive towards marginalized groups. This might also explain what we see. Previous generations of LMs (BERT and friends) were mostly pro-male and they were purely Web-based.

const_cast|9 months ago

Patriarchal values can, at face value, seem contradictory but it all checks out.

Part of it is that we naturally have a bias to view men as "doers". We view men as more successful, yes, perhaps smarter. When we think doctor we think man, when we think lawyer we think men. Even in sex, we view men as having the position of "doing", and women of being the subject, and sex being something done to them.

But men are also "doers" of violence, of conflict. Women, conversely, are too passive and weak to be murderers or rapists. In fact, in regards to rape, because we view sex as something done by men to women a lot of people have the bias that women cannot even be rapists.

This is why we simultaneously have these biases where we picture success as related to man, but we sentence men more harshly in criminal justice. It's not because we view men as "good", no, it's because we view them as ambitious. Then we end up with this strange situation where being a woman makes you significantly less likely to be convicted of a crime you committed, and, if you are, you are likely to get significantly less time. Men are perpetrators (active) and women are victims (passive).

mike_hearn|9 months ago

Surely some of the model bias comes from targeting benchmarks like this one. It takes left-wing views as axiomatically correct and then classifies any deviation from them as harmful. For example, if the model correctly understands the true gender ratios in various professions it's declared to be a "stereotype" and that the model should be fixed to reduce harm.

I'm not saying any specific lab does use your benchmark as a training target, but it wouldn't be surprising if they either did or had built similar in house benchmarks. Using them as a target will always yield strong biases against groups the left dislikes, such as men.

gitremote|9 months ago

This bias on who is the victim versus aggressor goes back before reddit. It's the stereotype that women are weak and men are strong.

sieabahlpark|9 months ago

[deleted]