top | item 47040566

(no title)

d_burfoot | 14 days ago

> they mimic and amplify the inherent racism present in their own training data

LLMs turn out to be biased against white men:

https://www.lesswrong.com/posts/me7wFrkEtMbkzXGJt/race-and-g...

> When present, the bias is always against white and male candidates across all tested models and scenarios. This happens even if we remove all text related to diversity.

discuss

order

dogmayor|14 days ago

Important sentences immediately before the ones you quote.

> For our evaluation, we inserted names to signal race / gender while keeping the resume unchanged. Interestingly, the LLMs were not biased in the original evaluation setting, but became biased (up to 12% differences in interview rates) when we added realistic details like company names (Meta, Palantir, General Motors), locations, or culture descriptions from public careers pages.

daveguy|14 days ago

Hah. Even LLMs know Meta and Palantir are evil af.

aprilthird2021|14 days ago

These are because of post-training. You have to give it such directives in post-training to correct the biases they bring in from scraping the whole internet (and other datasets like books, etc.) for data

biophysboy|14 days ago

Looking at the paper, the effect is significant but weak (5-7%), even with the conditionals that magnify the effect. I would be curious to see the effect if this experiment were performed on a slightly different categorical variable (e.g. how are two white ethnicities treated). I do think its bad if preferences are "baked in" to the default though - prompting them away seems like a bad solution.

113|14 days ago

That's not a reliable source.