Are popular toxicity models simply profanity detectors?

[+] YEwSdObPQT|4 years ago|reply

Something alluded to here is that many of the Languages models use US English. Many terms that are offensive in the US, may not be offensive at all in the UK. e.g. "Fag" in the UK is frequently used to refer to cigarettes. "Can I bum a fag?" literally means "Can I have one of your cigarettes please?".

Similarly something that might be a cat call such as "Get your baps out" (shows us your breasts), could also be used by a baker since a "bap" is a type of bread roll in a slightly cheeky advert as most people are aware of the pun.

How are you going to train an AI to know the context that the person might be talking about bread instead of a woman?

Has anyone realised yet that almost all of this folly? I suppose not when there is money to be made.

[+] Gareth321|4 years ago|reply

I couldn't agree more. I've been watching this shift in acceptable standards in America move from intent to perception with considerable consternation. It seems that an absence of malintent in language is no longer sufficient defense. If one perceives the language as offensive, it is. This is an absurd standard because with a sufficient audience size, someone will find something offensive. This standard is a de facto end to all discussion. It only takes one hyper sensitive audience member (legitimate or troll) to shut down all discussion.

This is not a sustainable strategy for society. It is of course impossible for algorithms to parse perceived intent by the lowest common denominator, and attempting to do so is nothing more than defensive legal posturing.

[+] skissane|4 years ago|reply

> "Fag" in the UK is frequently used to refer to cigarettes.

Another example: faggots are a traditional meatball dish from parts of England and Wales-the term has nothing to do with anyone’s sexuality. And yet, American social media companies have repeatedly blocked British users for posting about the dish.

There is also a British pudding called “spotted dick” - in this context, “dick” is a dialectical term for “pudding”, no known connection with genitalia.

A friend of my wife recently had her Facebook account restricted for making a post about cooking jerk chicken for dinner. Facebook claimed she was using “abusive language”

[+] pjc50|4 years ago|reply

Br.Eng. has had plenty of experience evolving under censorship, it's where our great tradition of innuendo comes from. I'm thinking particularly of Polari (Roma-influenced gay slang), and how Kenneth Williams got an entire comedy show packed with man-on-man innuendo to be broadcast regularly on national radio in the middle of the afternoon (Round the Horne). At a time when homosexuality was illegal.

Or try explaining panto to Americans. Or even Allo Allo, a show which would .. definitely not be made nowadays.

I've seen false positives referred to as the "Scunthorpe problem". Someone on the internet has a nice map of filthy-sounding real UK place names ...

[+] vidarh|4 years ago|reply

> How are you going to train an AI to know the context that the person might be talking about bread instead of a woman?

For starters, you can't unless you actually include the context in the training set.

Then again, a lot of humans won't successfully manage that either...

[+] marcus_holmes|4 years ago|reply

Nonce. Means paedophile in the UK, apparently [0]. I was brought up in the UK and only discovered this recently (the bad way). So not even the whole of the UK, but only parts of it.

Obviously also a technical term used all the time in its technical sense with no problem.

Context and culture is way more important than the actual words used when trying to determine the meaning of a statement.

https://www.urbandictionary.com/define.php?term=nonce

[+] haunter|4 years ago|reply

Faggots are also a food in the UK. It was a poor man's dish, using offals. Not so popopular anymore but you can still find it in some stores (and probably at local butchers in the Midlands and Wales)

https://groceries.aldi.co.uk/en-GB/p-mr-brains-6-pork-faggot...

[+] mdoms|4 years ago|reply

I have been banned from a subreddit for using the word "cunt" in an affectionate way. We must not let Americans design these models.

[+] js8|4 years ago|reply

> Has anyone realised yet that almost all of this folly?

This guy did: https://youtu.be/3-son3EJTrU

Humans are inventing double-entendres to create the ambiguity.

[+] eecc|4 years ago|reply

That’s why you introduce USA-style puritan PoCo.

there are enough trigger happy people with less understanding than an AI, all eager to kick enough rages to wear down a saint…

[+] jgrahamc|4 years ago|reply

Now imagine being called my name. A John is a prostitute’s client and as for Cumming.

[+] hnbad|4 years ago|reply

Let's be clear about this: the purpose of these models is to cut costs, not to accurately gauge what was said.

Many business models based on user generated content wouldn't be possible if the businesses had to pay minimum wage to people for moderating that content. Using an AI model, no matter how broken, allows them to seem more concerned than if they were just relying on an old-school word filter without doing any actual due diligence.

[+] Lucasoato|4 years ago|reply

The absurdity is so high that to lower the amount of hate-speech Facebook started banning the word "hate" (and translations).

This isn't documented anywhere, but I find it hilarious and dystopian that posts that contain the world "hate" in the description or in the comments get blocked more easily, or soft-banned (they don't show up in anyone's feed), even if the word is used in harmless contexts...

[+] zeruch|4 years ago|reply

That language aspect aside (it's difficult enough to wrap around), the abject and consistent failure around contextual cues is abysmal. Sarcasm, deflection and other aspects of talking patterns in relationships (e.g. I have a good friend and we consistently threaten each other with ever more Rube Goldbergian forms of cartoonish violence -such as offering to trebuchet them into a wall of shattered glass coated in guano- when one of us says something fleetingly dumb, as a running gag. FB has flagged this more than once, and it's dumb AF).

Content management done well is expensive, and requires humans with superior grokking skills. No place cares enough to try.

[+] toss1|4 years ago|reply

Context, and the inability to actually have any understanding is the entire problem with what is called "AI" these days.

Sure "AI" can do a lot of impressive first-level pattern matching, and that can be the basis of many useful outputs.

But ANYTHING that requires actually understanding the context, whether it is existence of new obstacles for a 'self-driving' vehicle, understanding of actual meaning for a language model, or anything else, is a complete and utter failure.

Despite appearances, while some of it is genuinely useful, we're really no further than fancy parlor tricks. Crack even the next level of contextual understanding, and it will be an astonishing leap.

[+] s3tz|4 years ago|reply

With companies like that, at the end of the day they only really answer to shareholders so every problem is a nail.

[+] unknown|4 years ago|reply

[deleted]

[+] Foobar8568|4 years ago|reply

And words with multiple meaning (and opposite meaning) depending of context or voice intonation.

[+] capableweb|4 years ago|reply

> Has anyone realised yet that almost all of this folly? I suppose not when there is money to be made.

Yes, of course everyone realizes that no filter actually works in practice. But they are not built to work, they are built to give the smallest impression of working. Tumblr doesn't need their site to be absolutely clear of nudity to attract investors, they only need to give the impression that it's absolutely clear of nudity and that they are trying to keep it so.

[+] monkeybutton|4 years ago|reply

The same argument can be made for machine translation. For an algorithm to be able to successfully translate an entirely new idiom, expression or metaphor, it has to be aware of the real world and context it came from. Until there is some level of AGI that can observe and interpret the world on its own, translation and toxicity detection will be limited to examples included in training data.

[+] throwaway48375|4 years ago|reply

[deleted]

[+] echen|4 years ago|reply

One of the problems with real world machine learning is that engineers often treat models as pure black boxes to be optimized, ignoring the datasets behind them. I've often worked with ML engineers who can't give you any examples of false positives they want their models to fix!

Perhaps this is okay when your datasets are high-quality and representative of the real world, but they're usually not. For example, many toxicity and hate speech datasets mistakenly flag texts like "this is fucking awesome!" as toxic, even though they're actually quite positive -- because NLP datasets are often labeled by non-fluent speakers who pattern match on profanity.

(So is 99% accuracy or 99% precision actually a good thing? Not if your test sets are inaccurate as well!)

Many of the new, massive scale language models use the Perspective API to measure their safety. But we've noticed a number of Perspective API mistakes on texts containing positive profanity, so this post was an attempt to explain the problem and quantify it.

[+] _the_inflator|4 years ago|reply

I can relate. Recently learned while talking to some folks from Spain, that they use the word "puta" a lot, and it is used to express feelings not meant as a rude insult, as they explained.

There are some differences in German, too. For example "wixen/wichsen" is an old word that means to wipe/shine your shoes and is still in active use in this sense in Switzerland as well as Austria, however it lost its appeal in Germany, because it is now primarily with a different meaning. The Wix company took this different understanding of its brand name to an ad: https://www.youtube.com/watch?v=IddnMutPgTI

Since we have an IT background here, same goes for "Mongo", like in MongoDB. Mongo is considered making fun of handicapped people in Germany.

Former Fraport AG changed its brand name because it was abbreviated FAG - Flughafen AG and found it difficult to expand business with that brand name.

No bad actors, if you ask me, only different context. List could go on and on...

[+] vintermann|4 years ago|reply

Yes, it's pretty surprising that MongoDB has gotten by without change when even "master repository" had to go. What's seen as offensive online is very US-centric, I knew that, but I didn't know mongo wasn't a well-known slur in the US.

People with Down's syndrome used to be called "mongoloids" because they were thought to have Asian-looking eyes (epicanthic folds), so calling someone "mongo" or "mong" is basically an extra racist way of calling someone mentally retarded.

[+] tjungblut|4 years ago|reply

That reminds me of a time where we had the substring "prd" in our Azure Storage Accounts. The MSFT profanity filter thought it meant fart in Czech, where we used it as a moniker for production. Ever since then the accounts were named with the substring "prod".

[+] formerly_proven|4 years ago|reply

I'm not so sure about MongoDB, they also named their Ruby bindings Mongoid.

[+] heavyset_go|4 years ago|reply

Makes sense. It's not like Google or any other company training AI models are hiring professional linguists or psychologists to investigate the true meaning behind each of the billions of internet posts they've scraped, labeled and trained their models on. They're throwing pennies at workers in the developing world to label as much data as they can as fast as they can.

It's also likely that there's a significant lack of context to the data points, not just because the posts are divorced from their parent content, but because of a culture and language divide between the labeler and the author of the data they're labeling, as well.

[+] echen|4 years ago|reply

Yeah, I think this is part of the problem. Is large-scale, low-quality data good? Sometimes it is (depending on the tradeoff), but from a model performance perspective, it's often more effective to get smaller amounts of higher-quality data instead.

Hopefully people also don't need to be at the level of a professional linguist to label messages like "this is fucking awesome" correctly!

And great point on context. For example, the GoEmotions dataset didn't present labelers with the actual post or subreddit the message came from -- just the text itself. That makes it really difficult to label something like "his traps hide the fucking sun"! But once you see the comment in its original context https://www.reddit.com/r/nattyorjuice/comments/aee3wx/olympi..., and know that it's in the /r/nattyorjuice bodybuilding subreddit, it's much easier to realize that this is talking about someone's large muscles.

[+] tdeck|4 years ago|reply

> It's not like Google or any other company training AI models are hiring professional linguists or psychologists to investigate

I had a housemate once who was an American linguistics grad and spent a year applying semantic labels for Google. I know he worked on a team, although I don't know what they did since I didn't work at a Google at the time.

[+] boredumb|4 years ago|reply

The waste of resources on detecting what is decidedly "toxic" on internet forums is insanity. If you want to police communities online hire moderators, if your platform is so big you ""can't"" have moderators moderate then you are not in a position to be policing the platform. If you want to build puritanical devices to spam your moderators into doing a human review than that is your business but the use of machine learning for any sort of proactive policing is going to be a parody that will result in a sterile environment and/or a lot of bitter users.

[+] npilk|4 years ago|reply

Interesting stuff. I doinked around with this a while back when working on a 'hot take oracle' - basically a search box that finds a strongly-opinionated tweet about something (https://hottakeoracle.herokuapp.com/).

You can see that my model is basically just filtering for profanity as an indicator of "strong emotion", which makes sense. But it's interesting that postive profanity seems to be such a thorny problem, at least for Perspective.

[+] selestify|4 years ago|reply

I searched for “tourism” and “Kristen Gray” hoping to get a tweet like [1], but alas the results were actually reasonable :)

[1] https://twitter.com/celesteperez___/status/13508599618452070...

[+] sodality2|4 years ago|reply

I tried "hacker news" and got "Hacker News is an internet oasis". :D cool!!

[+] newbytuby|4 years ago|reply

pretty cool idea! would be interesting to see how the model's selection changes with more specific data

[+] ohCh6zos|4 years ago|reply

I really like your hot take oracle.

[+] johnchristopher|4 years ago|reply

On the other hand and from my recent experience, my 2ç:

I recently started playing counter strike source again online, just for 10 minutes of fun at first (to see if it would still tick with me). I randomly picked up a server and the ambiance was cheerful and nice. I noticed the rules said "no profanity, have fun" and indeed people were mostly polite.

I tried another server at random a bit later and there was more insults, along with a lot of taunting.

I switched back to the first server and have been regularly playing an hour or two every three days and there is a difference with other servers. Some random people coming and throwing insults, even mild ones like "fuck you" or "you son of a bitch awp" get insta ban and it makes the whole session a much better experience. Maybe it's a safe place but playing with polite people is more enjoyable to me now than playing with insult gatlings.

Language is political. There are many meanings to words, depending on context but I do think it's not innocent to swear in front of people or to use swear words to look cool. These are still swear words and insults and their first original use is to provoke or taunt or display aggression. Even if it's only used for "this is album is the shit !", it's still a (childish) provocation. Reminds me of the brogrammer fad.

FWIW: I get regularly owned on this server and I am at the bottom of the ranks but it's still more fun and enjoyable than other servers I tried where I can reach the top but... it's not a nice place. I think online servers are like bars.

Side-note: I was pleasantly surprised to see that "gg" is still thrown around after rounds :). It's way better than "git gud" that came later and that I find horribly toxic.

[+] gillesjacobs|4 years ago|reply

I did research on this topic in a cyber safety research project. We focused on cyber bullying specifically but encountered the issue of non-toxic profanity as well. We used neural representation learning as well as feature methods and indeed, common profanity words are weak predictors in the better models. Still we found instances of non-toxic profanity being classified as bullying.

An immediate solution is to apply multitask methods to your target dataset and include the one proposed in OP. It's always good to have more resources like this, even though SurgeHQ overstates the size of their resources by large margin in their copy. The 1000 post instances of their dataset is far from "the largest": I have several aggression, toxicity and bullying human-annotated datasets right here with over 100k instances.

[+] nathias|4 years ago|reply

Making your training data into a cultural standard is just imperialism, but that's the goal here righ? If you don't comply to the standards of US toxic positivity you should be excluded to not hinder the add sales.

[+] junon|4 years ago|reply

I worked a bit with an "intent detection" library and boy, was it unhelpful. I could craft sentences meaner than most and it'd cheerfully tell me they were friendly.

In a similar vein, there are popular "AI Mental Health" apps I've gotten to straight up instruct me to end my own life with some trivial conversation.

EDIT: Here's one, though I don't think it's ML. https://text2data.com/Demo

> It would be really nice if you'd end your own life :) Everyone would be happy.

> This document is: positive (+0.62)

For https://monkeylearn.com/sentiment-analysis-online/:

> Positive 84.1%

For http://text-processing.com/demo/sentiment/:

> Pos 0.7, Neg 0.3

For https://aidemos.microsoft.com/text-analytics:

> 100% positive

For https://komprehend.io/sentiment-analysis

> Positive

[+] raxxorrax|4 years ago|reply

I would say they are not even that, not by a long shot, since they are unable to evaluate context. It is more probable that content is offensive when vulgar language is present, but that doesn't have to be the case.

Delegating content control to an AI (that doesn't qualify for anything intelligent) is not a working solution.

> as a first-pass filter, leaving final judgments to human decision makers — marking all profanity as toxic can make perfect sense

You would need humans to look at profanity constantly.

> Our mission involves creating a safer Internet, but we don’t want to miss out on our favorite content because of AI flaws in the meantime.

There is a limited AIs that do create content, but a profanity filter always does the exact opposite.

[+] strogonoff|4 years ago|reply

The perceived goal is “detect toxicity”, but let’s unwind this goal a bit.

Is it the lofty “make people be nice to each other”?

Well, the paradox is that being nice is possible with the strongest choice of words, while being very harsh can sound most fluffy bunnies on the surface. In fact, in human relationships there are degrees of mutual familiarity where being exceedingly polite and not “insulting” your counterparty would be perceived as negative—where insults are not taken at face value, but rather as signifiers of friendliness (there’s a line, of course).

Shall we unwind the perceived goal differently?

Of course, the platform’s actual customers are the advertisers (we are talking about a hypothetical platform, but where is it really different?), and by being free to the user it participates in a very limited oligopoly of big social so no, it doesn’t really care about what anyone really meant or intended, and it definitely isn’t going to hire real humans who’d make an effort at grasping the context of the conversation.

The real objective is for the platform to not have problems with law enforcement when one user complains about another user for being naughty, discriminatory, threatening, etc.—and, of course, we shouldn’t expect anything more from toxicity detectors geared towards that goal. As long as it’s not egregious enough that users leave en masse to a competitor (which can hardly exist, no honest business could reasonably compete with “free”) the platform wouldn’t care since users only matter to advertising revenue as cattle in numbers.

[+] pwdisswordfish9|4 years ago|reply

The actual goal is ‘save money on hiring human moderators, while still maintaining the pretence of caring’.

[+] avereveard|4 years ago|reply

consider the following sentences:

we need to get rid of black people

we need to get rid of black people poverty

we need to get rid of black people below poverty level

we need to get rid of black people hurdles keeping them below poverty level

the gist is that without unravelling a sentence full context, a lot of verbage can refer to a lot of different action.

focusing on profanity is the low anging fruit, so to say.

[+] its_bbq|4 years ago|reply

Former Jigsawyer here. I think this article is pretty fair to Perspective given that it was never meant to be used in a fully automated way, just as a first pass to help forum moderators.

It's very difficult when you blur the lines of code and ethics, as real world ethical judgements aren't necessarily consistent or well defined in a way which is easily translatable, even by a large ML model. Jigsaw is a great example of this -- right across the aisle (pre-pandemic) from Perspective is a team fighting internet censorship. Obviously Perspective's "censorship" is different in quality from the Great Firewall, but it shows the hairiness of the problems.

All this is to say the people at Jigsaw are some of the most brilliant people I've ever met and I'm glad they're out there working on difficult problems.

[+] RandyRanderson|4 years ago|reply

Machine learning is not something that can solve for all X. Where data is sparse OR where there is uncertainty, there needs to be a fallback.

We need to come up with UI patterns and flows that reflect this otherwise ML solns will continue to disappoint.

[+] _zooted|4 years ago|reply

My content labeling system identified this as an advertisement.

[+] robertlagrant|4 years ago|reply

The situation appears simple:

- unpleasant discourse on online platforms is blamed on the platforms

- the platforms can't moderate this manually (nor objectively), so they look for a tool

- a tool can't possibly do anything useful, but it satisfies the "something must be done" media demand

- the tool will hurt conversation, and people, and potentially eventually threaten the platforms it runs on

- but those problems feel smaller than the demand that "something must be done"

[+] Handytinge|4 years ago|reply

The Americanisation (and generally Calificornication) of the internet is certainly a net negative for the other 95% of the worlds population.

Regularly now I find social media sites telling me "do you want to review this before you post it? You're bullying or being offensive". No cunt, I'm good.

Different words and phrasing have different impacts across cultures. Unfortunately Instagram gets to decide what my entire culture is allowed to say online. That's fucked.

[+] joedoejr|4 years ago|reply

10 years ago google failed at machine translation and NLP, giving hideous and meaningless statistical translation, same no wonder they fail at language understanding by training algorithm with Indians. They will to do science R&D is lowering every year.

[+] vintermann|4 years ago|reply

Lately, there seems to be a trend in natural language models to have some sort of knowledge base lookup or memory.

I'm guessing it's only a matter of time until this comes to toxicity models, so they can look up who said it, and to who it was said.

211 comments