GPT-3 implicitly favor text written by authors from powerful social positions

[+] motohagiography|4 years ago|reply

That's a thorny one. I had a knee jerk reaction but upon reflection they're right about something if perhaps for some different reasons. Crux of it is, all great writing is really great editing. So called "privileged" writing in journals, newspapers, and paragons of English style like the Economist and the FT, along with literary fiction, or even crappy genre fiction - will have had the benefit of an editor.

Their argument seems predicated on the idea that either the author is the only writer and the text leapt from his head like Athena fully formed. (like these comments of mine, surely), or their entire sample set of student newspapers all had equally competent sub-editors. I'd say their argument that privileging any text is a "language ideology" is weak because the percieved quality of the writing should be attributed to the additional work that went into its editing, whereas they're saying it's due to the authors social status based on zip code. Chances are, the smaller school papers are just some yahoo publishing their own copy.

Too many holes. It seems to just elevate the same critical theory as a pretext for asserting a qualification to govern GPT model training, by using the same problematizations it uses on everything else. (e.g. call it x'ist until you control it, invent an unsolvable problem only you can manage, dilute and destabilize consensus with exogenous concerns, etc.) I'd agree a lot of good stuff is probably not making it into language models because it's not edited (or ironically, not gatekept), but I'm not sure the authors are really sincere about improving language models. To me, they're using a very narrow interpretation of quality writing to assert that GPT models require governance and political accountability.

[+] jdkee|4 years ago|reply

“We find that newspapers from larger schools, located in wealthier, educated, and urban ZIP codes are more likely to be classified as high quality. We then demonstrate that the filter's measurement of quality is unaligned with other sensible metrics, such as factuality or literary acclaim. We argue that privileging any corpus as high quality entails a language ideology, and more care is needed to construct training corpora for language models, with better transparency and justification for the inclusion or exclusion of various texts.”

So better grammar and use of language?

[+] TimTheTinker|4 years ago|reply

Right, invoking "ideology" seems a bit unnecessary and politically motivated. "More highly educated people produce better content." In other news, poor neighborhoods have more violent crime than rich neighborhoods...

> We then demonstrate that the filter's measurement of quality is unaligned with other sensible metrics, such as factuality or literary acclaim.

So the filter can't check whether something is scientifically accurate or artistically appealing. That makes sense. Of course it can't.

[+] lurgburg|4 years ago|reply

"Better" is just presupposing the conclusion: that there is something inherently superior about certain styles. That thinking is precisely the "language ideology" the authors are concerned about!

Better for who? For what purpose?

[+] claudiawerner|4 years ago|reply

Dialect variation, English second-language speakers, etc. may be classified as better or worse (though this is controversial and I'm not entirely convinced for the dialect variation case), but that doesn't mean that they don't have something important to say. Some of the most productive conversations I've had are with people with far less mastery of English than native speakers, and even people whose grammar and spelling are poor.

[+] isoblvck|4 years ago|reply

  a well funded multiple paid teacher or admin reviewed publication backed by money to invest in good software has a higher quality and gpt detects that therefore gpt is full of bias seems less about model bias and more about societal failinfs

[+] quinnjh|4 years ago|reply

But we want to blame the black box, not our own enlightened view of what's "good"

/s Sarcasm aside, i think GPT "did well" here in terms of picking up an average of what society deems good. This is not something comfortable, but i also dont think it is something inaccurate. Hopefully more of these ai enabled "revelations" (that back what some critical theorists have been saying for decades) will help us unpack and understand the collection of biases we each hold. Yes, the failings are societal, can it be a point of reflection? Or do we keep refiguring the model to obscure the issue?

9 comments