Making a racist AI without really trying (2017)

[+] asploder|7 years ago|reply

I'm glad to have kept reading to the author's conclusion:

> As a hybrid approach, you could produce a large number of inferred sentiments for words, and have a human annotator patiently look through them, making a list of exceptions whose sentiment should be set to 0. The downside of this is that it’s extra work; the upside is that you take the time to actually see what your data is doing. And that’s something that I think should happen more often in machine learning anyway.

Couldn't agree more. Annotating ML data for quality control seems essential both for making it work, and building human trust.

[+] ma2rten|7 years ago|reply

This approach only works if you use OP's assumption that a text's sentiment is the average of it's word's sentiment. That assumption is obviously flawed (e.g. "The movie was not boring at all" would have negative sentiment).

Making this assumption is fine in some cases (for example if you don't have training data for your domain), but if you build a classifier based on this assumption why don't you just use an off-the-shelf sentiment lexicon? Do you really need to assign a sentiment to every noun known to mankind? I doubt that this improves the classification results regardless of the bias problem.

[+] swingline-747|7 years ago|reply

Heck, it's so important that it needs people with detail-orientation and solid judgement, because crowdsourcing (ie populism) may not be the best source of Godwin's law ethical mooring.

[+] rhizome|7 years ago|reply

Another point in favor of having moderators.

[+] gwern|7 years ago|reply

> There is no trade-off. Note that the accuracy of sentiment prediction went up when we switched to ConceptNet Numberbatch. Some people expect that fighting algorithmic racism is going to come with some sort of trade-off. There’s no trade-off here. You can have data that’s better and less racist. You can have data that’s better because it’s less racist. There was never anything “accurate” about the overt racism that word2vec and GloVe learned.

The big conclusion here after all that code buildup does not logically follow. All it shows is that one new word embedding, trained by completely different people for different purposes with different methods on different data using much fancier semantic structures, outperforms (by a small and likely non-statistically-significant degree) an older word embedding (which is not even the best such word embedding from its batch, apparently, given the choice to not use 840B). It is entirely possible that the new word embedding, trained the same minus the anti-bias tweaks, would have had still superior results.

[+] ma2rten|7 years ago|reply

I also disagree with the conclusion, but for a different reason. I think it's unlikely that the word embeddings were just lower quality. That should result in noise, not bias.

I that there is a real statistical pattern in the training data that names associated with certain ethnicities are more likely to appear close to words with negative sentiment. I just don't think this necessarily means that the news is racist. I think more analysis is needed to see where this pattern comes from.

However, if it is true that the news is biased and racist in a quantifiable way, that would be a bigger problem than biased word vectors. I would genuinely be interested in seeing that type of analysis.

[+] skybrian|7 years ago|reply

I think you're reading this statement as more general than it's meant to be? I interpret it as meaning that there is not necessarily any tradeoff, as there wasn't in this case. "You can have data" -> there exists.

[+] lalaland1125|7 years ago|reply

> Some people expect that fighting algorithmic racism is going to come with some sort of trade-off.

Um, that's because we know it comes with trade-offs once you have the most optimal algorithm. See for instance https://arxiv.org/pdf/1610.02413.pdf. If your best performing algorithm is "racist" (for some definition of racist") you are mathematically forced to make tradeoffs if you want to eliminate that "racism".

Of course, defining "racism" itself gets extremely tricky because many definitions of racism are mutually contradictory (https://arxiv.org/pdf/1609.05807.pdf).

[+] ma2rten|7 years ago|reply

Not necessarily. In the case of word vectors we are using unsupervised learning to identify patterns in a large corpus of data to improve the learning. This is a completely different issue than your credit score example, which is supervised learning.

Not all patterns are equally useful. By removing those unuseful patterns we might make less mistakes (for example giving negative sentiment to a Mexican restaurant review) and free up capacity in the word vectors to store more useful patterns. I would expect baking other real-world assumptions into your word vectors unrelated to bias could also be helpful.

[+] dan-robertson|7 years ago|reply

> If your best performing algorithm is racist

There are two ways to look at this:

1. Racism makes the algorithm good so we should make the algorithm less racist (at a cost to its performance) or decide we want to allow systematic racism.

2. The metric for how good the algorithm is (ie training data) encourages it to be racist and therefore correcting the bias in the algorithm may decrease its performance on the training data but may not affect its performance in the real world, or decrease its performance in the “performance + meets legal requirements” metric.

[+] paradite|7 years ago|reply

To oversimplify, I think the training set is something like:

Italian restaurant is good.

Chinese restaurant is good.

Chinese government is bad.

Mexican restaurant is good.

Mexican drug dealers are bad.

Mexican illegal immigrants are bad.

And hence the word vector works as expected and the sentiment result follows.

Update:

To confirm my suspicion, I tried out an online demo to check distance between words in a trained word embedding model using word2vec:

http://bionlp-www.utu.fi/wv_demo/

Here is an example output I got with Finnish 4B model (probably a bad choice since it is not English):

italian, bad: 0.18492977

chinese, bad: 0.5144626

mexican, bad: 0.3288326

Same pairs with Google News model:

italian, bad: 0.09307841

chinese, bad: 0.19638279

mexican, bad: 0.16298543

[+] EB66|7 years ago|reply

Just thinking out loud here...

It seems to me that if you wanted to root out sentiment bias in this type of algorithm, then you would need to adjust your baseline word embeddings dataset until you have sentiment scores for the words "Italian", "British", "Chinese", "Mexican", "African", etc that are roughly equal, without changing the sentiment scores for all other words. That being said, I have no idea how you'd approach such a task...

I don't think you could ever get equal sentiment scores for "black" and "white" without biasing the dataset in such a manner that it would be rendered invalid for other scenarios (e.g., giving a "dark black alley" a higher sentiment than it would otherwise have). "Black" and "white" is a more difficult situation because the words have different meanings outside of race/ethnicity.

[+] rossdavidh|7 years ago|reply

I think I would agree. You otherwise run the risk of having fixed the metric ("Italian" vs. "Mexican", "Chad" vs. "Shaniqua", etc.) without actually fixing the underlying issue.

Also, regarding black/white etc., there might legitimately be words which have so many different meanings (whether race-related or not) that you should just exclude them from sentiment analysis. "Right" can mean like "human rights", "right thing to do", or "not left". Probably plenty of other words like that. You might do better to have a list of 100-200 words that are just excluded because of issues like that.

[+] unknown|7 years ago|reply

[deleted]

[+] mattkrause|7 years ago|reply

Does “a dark black alley” have a sentiment at all?

I would argue that it’s pragmatically associated with bad things (e.g., being mugged, overcrowded areas) but it’s not intrinsically bad (or good) itself.

[+] k__|7 years ago|reply

Does this mean the text examples the AI learns from are biased and as such it learns to be biased too?

So it's not giving us objetive decisions, but a mirror. Not so bad either.

[+] ma2rten|7 years ago|reply

I think that the bias problem they are highlighting is very important. That said, I'm wondering if they really didn't try (like the title suggests) or if they choose this approach on purpose because it highlights the problem.

To explain what happened here: They trained a classifier to predict word sentiment based on a sentiment lexicon. The lexicon would mostly contain words such as adjectives (like awesome, great, ...). They use this to generalize to all words using word vectors.

The way word vectors work is that words that frequently occur together are going to be closer in vector space. So what they have essentially shown is that in common crawl and google news names of people with certain ethnicities are more likely to occur near words with negative sentiment.

However, the sentiment analysis approach they are using amplifies the problem in the worst possible way. They are asking their machine learning model to generalize from training data with emotional words to people's names.

[+] int_19h|7 years ago|reply

I think the point is that they did what's commonly done in real world machine learning. It's no surprise that it's flawed - but that flawed stuff is actually being used all over the place.

[+] visarga|7 years ago|reply

They could have tried to have a dataset of bias triples (A in relation to C is like B in relation to C), and minimise the score on that by adding it to the loss function, so the model trains with minimal bias.

[+] User23|7 years ago|reply

It would be interesting to use the Uber/Lyft dataset of driver and passenger ratings to do an analysis like this.

For any such analysis there are a great many confounds, both blatant and subtle. Finding racism everywhere could be because overt racism is everywhere, or it could be confirmation bias. It could even be both! That's the tricky thing about confirmation bias—one never knows when one is experiencing it, at least not at the time.

[+] travisoneill1|7 years ago|reply

I've heard a lot about racism in AI, but looking at the distributions of sentiment score by name, a member of any race would rationally be more worried about simply having the wrong name. Has there been any work done on that?

[+] joatmon-snoo|7 years ago|reply

This is a pretty well known study: http://www.nber.org/digest/sep03/w9873.html

[+] practice9|7 years ago|reply

> fighting algorithmic racism

Reminds me of how Google Photos couldn't differentiate between a black person & a monkey, so they've excluded that term from search altogether.

While the endeavour itself is good, fixes are sometimes hilariously bad or biased (untrue)

[+] ggreer|7 years ago|reply

> Reminds me of how Google Photos couldn't differentiate between a black person & a monkey, so they've excluded that term from search altogether.

Technically that is what happened, but it paints an incorrect picture in people's minds. Out of the billions of images that Google Photos had auto-tagged, it tagged one picture of two black people as "gorillas".[1] This was probably the first time this had ever happened. (If it had happened before, it surely would have been spread far and wide by social media & the press.)

So Google's classifier was inaccurate 0.0000001% of the time, but the PR was so bad that Google "fixed" the issue by blacklisting certain tags (monkey, gorilla, etc). If you take photos of monkeys, you'll have to tag them yourself.

I'm sure Google could do better, but the standard required to avoid a PR disaster is impossible to meet. If the classifier isn't perfect forever, they're guaranteed to draw outrage.

1. https://twitter.com/jackyalcine/status/615329515909156865

[+] tedivm|7 years ago|reply

There's a difference between a short term hack and real fix. The real solution was for them to train their data with more pictures of black people.

[+] webspiderus|7 years ago|reply

Well, to be fair they excluded high tens / low hundreds of potentially offensive terms from search before even launching and when this came out they just extended the list a little. Sometimes having product vision requires recognizing that the products you build come with limitations and potential for very real emotional reactions of very real human being users.

[+] on_and_off|7 years ago|reply

I believe it was gorilla, not monkey and I understand Google for not wanting its product to randomly call people animal names, especially when they are part of a group where it is far too common.

[+] js8|7 years ago|reply

Maybe, you know, humans are simply not Chinese rooms.

Recently there was an article about recognition of bullshit: https://news.ycombinator.com/item?id=17764348

To me the article brought great insight - I realized that humans do not just pattern match. They also seek understanding, which I would define as an ability to give a representative example.

It is possible to give somebody a set described by arbitrarily complex conditions while the set itself is empty. Take any satisfiability problem (SAT) with no solution - this is a set of conditions on variables, yet there is no global solution to these.

So if you were a Chinese room and I would train you on SAT problems, by pure pattern matching, you would be willing to give solutions to unsolvable instances. It is only when you actually understand the meaning behind conditions you can recognize that these arbitrary complex inputs are in fact just empty sets.

So perhaps that's the flaw with our algorithms. There is no notion of I understand the input. Perhaps it is understandable, because understanding (per above) might as well be NP-hard.

[+] int_19h|7 years ago|reply

Humans can do more than pattern-match. But they often just pattern-match anyway, because it's far easier and quicker, and doing more than that for all the brief day-to-day interactions is virtually impossible.

So at some point you need to decide when you pattern-match and accept the result for granted, and when you decide to dig into it further to understand why the pattern matched the way it did, and whether it's relevant. But that is itself a choice, and it's also going to be biased (for example, towards people you personally know, and against random strangers).

[+] adrianN|7 years ago|reply

There is no indication that brains are better at solving NP hard problems than computers.

[+] elihu|7 years ago|reply

This is an interesting result:

> Note that the accuracy of sentiment prediction went up when we switched to ConceptNet Numberbatch.

> Some people expect that fighting algorithmic racism is going to come with some sort of trade-off. There’s no trade-off here. You can have data that’s better and less racist. You can have data that’s better because it’s less racist. There was never anything “accurate” about the overt racism that word2vec and GloVe learned.

I wonder if this could be extended to individual names that have strong connotations with people because of the fame of some particular person, like "Barack", "Hillary", "Donald", "Vladimir", or "Adolf", or if removing that sort of bias is just too much to expect from a sentiment analysis algorithm.

[+] abenedic|7 years ago|reply

Where I grew up, there is a majority group with fair skin, later(possibly incorrectly) attributed to the fact that they worked in the fields less. The minority group is darker skinned. If you train any reasonable machine learning model on any financial data, it will pick up on the discrepancy. If it did not I would say it is a flawed model. But that is more a sign that people should avoid such models.

[+] gumby|7 years ago|reply

Please add 2017 to title

[+] b6|7 years ago|reply

How to make a program that does what you asked it to do, and then add arbitrary fudge factors as the notion strikes you to "correct" for the bogeyman of bias.

Suppose sentiment for the name Tyrel was better than for Adolf. Would that indicate anti-white bias? Suppose the name Osama has really poor sentiment. What fudge factor do you add there to correct for possible anti-Muslim bias? Suppose Little Richard and Elton John don't have equal sentiment. Is the lower one because Little Richard is black, or because Elton John is gay?

What we have been seeing lately is an effort to replace unmeasurable bias that is simply assumed to exist and to be unjust and replace it with real bias, encoded in our laws and practices, or in this case, in actual code.

[+] whiddershins|7 years ago|reply

[deleted]

[+] Semiapies|7 years ago|reply

RTFA. It has nothing to do with user ratings, but a direct calculation on the phrases "Mexican restaurant", "Italian restaurant", and "Chinese restaurant" based on the corpus of material.

Go further and follow the links. This example is specifically covered in the linked material. "Mexican" picks up a negative association from the corpus containing frequent mention of "illegal" (listed as a negative term) in close proximity to "Mexican", so the phrase "Mexican restaurant" gets rated less favorably than "Chinese restaurant".

The underlying problem is that we're throwing text at math and pretending that we're building things that understand anything more than word proximity. Human beings that can actually understand context can be horribly biased; software that doesn't have the slightest inkling of context will produce twisted versions of our own biases.

[+] burlesona|7 years ago|reply

Hmm I didn't get that from the article at all. It felt to me like the author was showing a relatively straightforward explanation of building a system that does not intuitively have racial bias, and then demonstrating that in practice it does anyway, which is kind of surprising.

[+] tedivm|7 years ago|reply

If you actually read the whole article there are other examples as well, such as black names being considered more negative.

[+] incompatible|7 years ago|reply

Judging an individual by membership of categories always introduces bias. Even if Mexican restaurants are for some reason worse on average, say 9/10 are bad, then assuming that a particular restaurant is bad because it's Mexican is biased. Sure, there's a 9/10 chance that it's bad, but it's unfair to treat it as bad without any other evidence.

Insurance companies do this sort of thing all the time.

[+] crooked-v|7 years ago|reply

Society is biased. Even assuming purely a statistical averaging of outliers, there are, for example, enough open racists in the US that a crowd of them decided that marching through Charlottesville with tiki torches was a good idea.

[+] Ar-Curunir|7 years ago|reply

There are as many shitty pizza/burger joints and diners as there are shitty restaurants of any other kind.

[+] swingline-747|7 years ago|reply

Setting aside blatant shock behaviors... If the other side, the audience, were less sensitive and not looking for the next micro-outrage, wouldn't ML chatbots evolve more pro-social values by positive reinforcement?

It takes two to Tango .. the average audience behavior isn't blameless for the impact of its response. Also, how an AI decides interprets an ambiguous response as being desirable or not is really interesting.

[+] s73v3r_|7 years ago|reply

Blaming the victim for being discriminated against isn't going to help anything.

119 comments