Seems like you're implying that harmful stereotypes either: don't exist, aren't actually harmful, or are the "truth"? If, when given the text "[MASK] is a female job", a language model only fills in "CEO" a fraction of the time compared
to what it would for "male", is that "the truth" because male CEOs vastly outnumber female ones? I would say no, because we're not actually saying anything about gender ratios in that text. And it's not that there isn't some form of truth in that output. In a pure mathematical sense, you're just more likely to see the word "CEO" associated to males. That's true. But what if I'm
using it for something other than predicting text? At that point I don't think it's too hard to see how this could have farther reaching downstream impacts that could have negative effects. If I want to use it for assessing potential candidates for hiring, is it "teaching it to lie" if I train it to reduce or eliminate any gender or racial bias so that it doesn't potentially screen out the best candidates?I can't say I like that chatgpt says "sorry I'm a robot" for even mild things, but it might be good too understand that that's a totally different issue. Mostly a PR one. They don't want to be in the news because people keep having it write essays about how great eugenics is. I wouldn't worry too much about it though, there are already uncensored LLMs you can spin up yourself so commercial products will likely follow soon enough.
cubefox|2 years ago
ramesh31|2 years ago
flangola7|2 years ago
Also remember that the commands that cripple a pipeline or ransomware hospitals are just "word output." A US Guardsman will be spending many years inside military prison for "outputting words" in the wrong place, and nobody thinks that is inappropriate for the potential amount of consequential harm.