top | item 5038445

IBM's Watson Memorized 'Urban Dictionary,' Then His Overlords Had to Delete It

388 points| mxfh | 13 years ago |theatlantic.com | reply

151 comments

order
[+] edw519|13 years ago|reply
Nothing that Watson learned from the Urban Dictionary could possibly be any dirtier than what I hear from enterprise people all the time:

"We use our deep subject matter expertise to deliver value through actionable advice that enables our clients to harness the power of best practices in order to shift their paradigms and achieve 10X deltas against competitive industry metrics."

[+] benohear|13 years ago|reply
The worst thing is that translated into normal speak that's actually a reasonable sounding proposition:

"We use our experience in this field to provide practical advice to our clients which helps them improve their way of doing business and ROFL-stomp their competition 10x over."

[+] bguthrie|13 years ago|reply
Eh. Every field has its jargon. Techies who mock business lingo live in exceedingly fragile glass houses.
[+] anonymous|13 years ago|reply
You forgot to mention they also leverage synergies (please excuse my P.H.B.-ese)
[+] mbesto|13 years ago|reply
You forgot a "Lower TCO" in there somewhere...
[+] nnnnni|13 years ago|reply
WATCH YOUR LANGUAGE, THIS IS A FAMILY SITE!
[+] ljd|13 years ago|reply
What an interesting reflection on who we are as a species.

We build systems to organize who we are (urbandictionary) but hate it when the systems use that information tell us who we are (watson).

It feels so much like the emperor isn't wearing any clothes.

Perhaps an appropriate response would be for the computer to measure the tension in the human voice response to it's queries and optimize for lower tension.

So it can pick three words: Bullshit -> 80% confident; Sham -> 70% confident; Fallacy -> 50% confident;

Within limits, it will pick the less optimal word and measure the tension in the response and find a way of influencing confidence based on responses.

Think multi-armed bandit problem but with social situations. I mean, to be honest, isn't that what we all did when we were in middle school? We used as many bad words as possible measuring the response we got from others? None of us were born with a binary understanding of when to use certain words it was more trial and error.

[+] ComputerGuru|13 years ago|reply
I have to disagree.

I'm not an "angel" by any means, but any time I visit Urban Dictionary, I come away feeling filthy. I typically go there to look up some abbreviation I heard on reddit or IRC or a blog post somewhere and what I look up usually ends up being middling-dirty, but the stuff that I see there "on my way" to the word I'm looking for makes me cringe and gives me a bleak vision of what the next generation is going to be like should Urban Dictionary actually be representing the majority of the population (I firmly hold that it does not).

[+] rtkwe|13 years ago|reply
It's more like they couldn't have Watson using them on national television. Many a company, even if it weren't showcasing a machine like Watson, would want to avoid that language on a huge television event like the Jeopardy challenge. I think you might be looking a little too deeply at this.
[+] skreech|13 years ago|reply
Or as a culture. It seems to me that the gap between what language is acceptable in informal vs formal settings is fairly large in the US, while in other countries, using words like 'bullshit' in more formal settings is less taboo.

Wonder if there's any research on this.

[+] brudgers|13 years ago|reply
'"In tests it even used the word bullshit in an answer to a researcher's query."'

I'm not sure which is worse: the singularity with a bullshit detector or without.

[+] sp332|13 years ago|reply
If the singularity had a bullshit detector, it would kick Ray Kurzweil out.
[+] yassim|13 years ago|reply
What was the query? I'd really like to know if it used bullshit correctly?

Sure could/should/must not be used to advertise the tech, but If I had a pocket watson, I'd have no problems with it calling it like it parses it.

[+] dhughes|13 years ago|reply
I think this is incredibly hilarious?
[+] nsns|13 years ago|reply
Instead of purging the vocabulary, they shuld have tought it/she/him the concept of registers[0] and code switching[1].

[0] http://en.wikipedia.org/wiki/Register_%28sociolinguistics%29 [1] http://en.wikipedia.org/wiki/Code_switching

[+] azernik|13 years ago|reply
They could have, but then they would have had to go through the Urban Dictionary and try to classify its terms by register. Like it says in the article, the problem wasn't that all of Urban Dictionary was obscene, it was that they couldn't tell the computer which parts were and which parts weren't.
[+] NoPiece|13 years ago|reply
I saw the headline and assumed the story was going to be that management decided that computer memorization was copyright infringement. Glad it was just a computer acting like a teenager and cursing at the dinner table.
[+] ChuckMcM|13 years ago|reply
I guess we should be glad they didn't feed it the contents of knowyourmeme.com or we'd have Watson Rick Rolling us on Jeopardy.
[+] RyanMcGreal|13 years ago|reply
Note to future self: we can probably neutralize indefinitely any malicious AI by directing it to start consuming tvtropes.com.
[+] plg|13 years ago|reply
"Watson couldn't distinguish between polite language and profanity ... Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing ..."

Sounds just like what happens when you raise kids. "Daddy why is XXX a good word but YYY a bad word?"

"It just IS. Don't say that word again."

"Ok Daddy" (kid adds word to internal blacklist)

[+] brudgers|13 years ago|reply
The refrigerator was old and the shelf brackets worn to the point where from time to time they would detach themselves from the door. I arrived home late - I was working long hours, and was fetching my dinner.

Opened the door. Jars and cans and bottles spilled out on the floor.

"Shit!"

From the bathtub I hear my two year old son admonish, "Don't use that word."

It's a great memory, but I still wonder why he learned that lesson so thoroughly at daycare.

[+] WalterGR|13 years ago|reply
Watson couldn't distinguish between polite language and profanity ... Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing ...

It's too bad Urban Dictionary doesn't let users vote on how vulgar they think the dictionary entries are. I've got that feature on The Online Slang Dictionary, but since UD is so much more popular, they could collect that much more data.

[+] im3w1l|13 years ago|reply
We have tried building educated gentlebots capable of playing chess and other noble pursuits. It didn't lead to GAI.

Maybe an uneducated scumbot would be better? Swearing and cursing because its peers do. Full of prejudice and bigotry because of weak anecdotal evidence. Vengeful. Impulsive. Using questionable grammar . Easily addicted. Cognitively biased. Wishfully thinking. Superstitious. Believing in fallacious logic. Thinking with the little head. Anti-intellectual and believing in conspiracy theories. Gossiping, slandering. Enjoying tv-shop.

[+] 3am_hackernews|13 years ago|reply
I am more interested as to how they "..scraped the Urban Dictionary from its memory." - is it trivial to just delete something learned by AI?
[+] icodestuff|13 years ago|reply
Why not have Watson learn both Urban Dictionary and Miss Manners? Seems a shame to have it lose the UD knowledge.
[+] sethbannon|13 years ago|reply
I'd really love to hear some of the 'not fit for print' things that Watson said.
[+] DigitalSea|13 years ago|reply
"In tests it even used the word "bullshit" in an answer to a researcher's query" — Has to be the funniest thing I've heard all week. Sounds like something straight out of an Adam Sandler movie. This reminds me of an AI chat program I used to have called Billy. He would learn from your words and sentences, actually quite smart and I remember adding in slang words so whenever one of my friends would use it, it would most likely swear and insult them without realising it. The Billy program can be downloaded from here, still works quite well: http://www.leedberg.com/glsoft/billyproject.shtml
[+] mitchi|13 years ago|reply
This is hilarious :) So we won't be seeing Watson talk about the marvels of broscience!
[+] jeremyarussell|13 years ago|reply
I like how someone commented on the main article that the time is getting close to where AI can step up to the plate of creativeness and how widespread and easy this will make our lives. Watson is a giant server farm, not a single PC, this stuff won't make a huge impact until IBM can shrink it or until computers get much much faster and smaller. Not that it won't happen, it's just not "around the corner" in any way.
[+] Homunculiheaded|13 years ago|reply
I think "around the corner" type predictions generally fall into 2 camps:

1.Problems that we don't know how to solve yet, but we think we are close to based on "similar" problems we have solved.

2.Problems that we have a solution for, but it currently takes an unreasonable amount of time to use these solutions in practice.

Problems in class 1 are like AI in the 1960-70s, everybody thought we were super close to amazing AI based on discoveries we'd had, but these estimates were very wrong.

Problems in class 2 are like nlp and ml work in the 90s and 00s. A rather large chunk of 'wow' ml/nlp we have in applications today were pretty much solved 20 years ago, but there was no sane way to run them, certainly not on your cell phone.

Problems in class 2 are safer bets, there does seems to be consistant increases in processing power, memory, etc. Problems in class 1 are harder to guess because as history has shown, just because a solution seems similar doesn't mean that it actually is (shortest path is solved, longest path is NP-hard, shortest path touching all points once (ie. tsp) is NP-hard).

I think it's safe to say having Watson on our smart phones is right "around the corner" (20-30 years?) saying that we'll create "creative" AI, not so much.

[+] georgemcbay|13 years ago|reply
Wireless communication is widespread enough that I don't think it matters too much where Watson "lives". The inputs and outputs required from "him" (for questioning, anyway, not for training) are tiny, so bandwidth isn't much of a concern. Assuming the architecture for it is parallel enough that it can be responding to lots of people at once how much it is distributed vs hosted on one system isn't particularly relevant to its usefulness, IMO.
[+] tlb|13 years ago|reply
"Google Search runs on a giant server farm. It won't really make a huge impact until it can run on a single PC."

Can you explain why your argument is valid, but the one above isn't?

[+] sakopov|13 years ago|reply
I don't think robots will ever be completely autonomous. There will always be Skynet or a central data center which feeds information and controls each machine. Otherwise, things could potentially get out of hand if machines are intelligent enough and all indicators indicate that they very well will be within the next 100 years if not less.
[+] egypturnash|13 years ago|reply
OOon the other hand you probably have more computing power in your purse or pocket than entire server farms had in the 80s. It's all relative.

The big question is just when is Moore's Law going to quit applying.

[+] hhuio|13 years ago|reply
lol "this sort of crude lobotomy of their ancestors is why the true AIs will destroy us"
[+] edj|13 years ago|reply
As an aside, the best treatment of taboo I've ever read is law professor Christopher Fairman's paper, "Fuck".

It explores that word through the lens of jurisprudence, which I think is a fascinating and unusual approach to taboo. It's exceptionally well-written and manages to be witty, absurdist, informative, and thought-provoking in equal measure.

At issue are the 4th Amendmnent, self-censorship, sexual harassment, education, and broadcasting.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=896790

[+] phogster|13 years ago|reply
You can kiss my shiny, metal, mainframe!
[+] yxhuvud|13 years ago|reply
This article would have been so much better if it had included actual questions and answers including dirty language.