top | item 36687537

(no title)

Seems like you're implying that harmful stereotypes either: don't exist, aren't actually harmful, or are the "truth"? If, when given the text "[MASK] is a female job", a language model only fills in "CEO" a fraction of the time compared to what it would for "male", is that "the truth" because male CEOs vastly outnumber female ones? I would say no, because we're not actually saying anything about gender ratios in that text. And it's not that there isn't some form of truth in that output. In a pure mathematical sense, you're just more likely to see the word "CEO" associated to males. That's true. But what if I'm using it for something other than predicting text? At that point I don't think it's too hard to see how this could have farther reaching downstream impacts that could have negative effects. If I want to use it for assessing potential candidates for hiring, is it "teaching it to lie" if I train it to reduce or eliminate any gender or racial bias so that it doesn't potentially screen out the best candidates?

I can't say I like that chatgpt says "sorry I'm a robot" for even mild things, but it might be good too understand that that's a totally different issue. Mostly a PR one. They don't want to be in the news because people keep having it write essays about how great eugenics is. I wouldn't worry too much about it though, there are already uncensored LLMs you can spin up yourself so commercial products will likely follow soon enough.

discuss

cubefox|2 years ago

Here is an example (about an earlier Anthropic paper) of what I meant, including one where hiring is involved: https://www.lesswrong.com/posts/PrLnptfNDg2wBWNyb/paper-the-...

ramesh31|2 years ago

None of this is valid until we have sentient AI with its own free will. Until then, these things are word calculators. And the only thing you're doing by censoring the output is providing less information to the people who are using them to make informed decisions.

flangola7|2 years ago

Define free will. There is no evidence that humans have it, or even could. Everything is ultimately a chain of causal events. How do I empirically verify that you have free will and are not simply a sophisticated calculator of various electrical imputs and outputs?

Also remember that the commands that cripple a pipeline or ransomware hospitals are just "word output." A US Guardsman will be spending many years inside military prison for "outputting words" in the wrong place, and nobody thinks that is inappropriate for the potential amount of consequential harm.