top | item 47040566

(no title)

d_burfoot | 14 days ago

> they mimic and amplify the inherent racism present in their own training data

LLMs turn out to be biased against white men:

https://www.lesswrong.com/posts/me7wFrkEtMbkzXGJt/race-and-g...

> When present, the bias is always against white and male candidates across all tested models and scenarios. This happens even if we remove all text related to diversity.

discuss

dogmayor|14 days ago

Important sentences immediately before the ones you quote.

> For our evaluation, we inserted names to signal race / gender while keeping the resume unchanged. Interestingly, the LLMs were not biased in the original evaluation setting, but became biased (up to 12% differences in interview rates) when we added realistic details like company names (Meta, Palantir, General Motors), locations, or culture descriptions from public careers pages.

daveguy|14 days ago

Hah. Even LLMs know Meta and Palantir are evil af.

aprilthird2021|14 days ago

These are because of post-training. You have to give it such directives in post-training to correct the biases they bring in from scraping the whole internet (and other datasets like books, etc.) for data

biophysboy|14 days ago

Looking at the paper, the effect is significant but weak (5-7%), even with the conditionals that magnify the effect. I would be curious to see the effect if this experiment were performed on a slightly different categorical variable (e.g. how are two white ethnicities treated). I do think its bad if preferences are "baked in" to the default though - prompting them away seems like a bad solution.

113|14 days ago

That's not a reliable source.