Watson crushes the competition in second round of 'Jeopardy'

[+] ssclafani|15 years ago|reply

David Ferrucci, the manager of the Watson project at IBM, on why he thinks Watson got the Final Jeopardy question wrong:

"First, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase, learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance. The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. “What US city” wasn’t in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn’t find much evidence to connect either city’s airport to World War II. (Chicago was a very close second on Watson’s list of possible answers.) So this is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine."

http://asmarterplanet.com/blog/2011/02/watson-on-jeopardy-da...

[+] gojomo|15 years ago|reply

Lame excuses! Watson is impressive, but I'm disappointed by the lack of any hint of comprehension behind its answers. When it's wrong, it's often nonsensically out-to-lunch, and its 2nd/3rd best answers are also often batty.

If it's just trained-up on statistical correlations between trigger phrases and likely answers in the constrained Jeopardy domain, then 90 32-core/512GB RAM servers seem like overkill.

[+] emelski|15 years ago|reply

For all the talk of the difficulties of playing Jeopardy! due to the "nuances of natural language" and "puns and double meanings in the clues", that did not really seem to be a factor in the second round -- most of the questions were quite plainly worded with answers easily discoverable just by searching. Accordingly, Watson performed dramatically better today than yesterday, when a larger portion of the questions did have nuance and plays-on-words in the phrasing. Note too how spectacularly badly Watson performed on the Final Jeopardy! question, where nuance _did_ play a much bigger role.

So today, we learned that machines can push buttons faster than people, and search is a great way to find answers for trivia questions. I doubt the former is a surprise to anybody alive in the past 50 years; the latter shouldn't surprise anybody who's ever used Google.

[+] gthank|15 years ago|reply

This. A thousand times this. Watson absolutely CRUSHED the human players on pretty much every question that was basic facts. I know Watson can probably generate answers faster than humans on simple search stuff, but it seemed so bad at some points that I wondered: is Watson not wired in with some sort of delay that mimics the delay that humans have between deciding to buzz in and actually buzzing in? A lack of such a system would seem to skew the results somewhat.

[+] scott_s|15 years ago|reply

http://lesswrong.com/lw/im/hindsight_devalues_science/

That essay focuses on social science, but I think it's still relevant here. Experts in the area did not think they could do this, and even the people involved weren't sure. It's easy to dismiss this as "yeah, it's just a big search engine" once you already know it's been done. Besides not accurately characterizing the approach Watson takes, that sentiment misses the fact that this was an open question.

Paradoxically, people would probably be more impressed if Watson did worse and the game was more competitive. It's like watching an NBA team play a high school team. The NBA team is so good that it looks easy despite the fact that they're that good because of decades of practice.

(Disclaimer: I work at IBM Research, and have associated biases.)

[+] TimothyBurgess|15 years ago|reply

So today, we learned that machines can push buttons faster than people

I'm not gonna lie... it was pretty entertaining watching Jennings squirm every time Watson beat him to the buzzer.

Also, there wasn't any mention of Watson having adaptive artificial intelligence but I would guess it's safe to say IBM was smart enough to include something like that. That in itself would be crazy hard to implement given the magnitude of what it's already doing... but not impossible. Maybe there are a few corrective algorithms in there somewhere.

[+] spitfire|15 years ago|reply

So is there any information on how they actually implemented watson? My understanding is it's a bayesian machine learning system, but I still don't know how it parses answers, or really does its magic.

Also, if there is anyone who thinks silicon valley has the smartest people around, this type of stuff should change your mind. Facebook is short trousers compared to this. and it's just a tech demo.

[+] Mahh|15 years ago|reply

Some cool stuff here: http://www-943.ibm.com/innovation/us/watson/

The real challenge behind Watson is the natural language parsing. Instead of abstracting information away from their sources(like a graph), sources seem to have been left intact in sentences in Watson's memory. Watson would read through this information in a way alike to how it interprets a question, and it would try to create links and possible answers based on connections in sentences from many sources(this gives thought on why pun questions are difficult for Watson). I can't speak on behalf of the mathematical implementation of the answer choices, but this is the high level way that Watson finds answers. Those videos talk about the cool stuff behind the algorithmic challenges of Watson.

[+] teraflop|15 years ago|reply

There's a high-level overview in this paper from AI Magazine, Fall 2010: http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf

[+] jdale27|15 years ago|reply

Also, if there is anyone who thinks silicon valley has the smartest people around, this type of stuff should change your mind.

Watson is an impressive achievement, but there are quite a few companies in Silicon Valley whose engineers could pull this off. It's more a matter of how much money management feels like throwing at it. It's great publicity for IBM, which has to put in a lot more effort than most Silicon Valley companies in order to look cool, but can afford it.

[+] kirpekar|15 years ago|reply

Interesting match.

Seems like Watson was able to ring in (clicker) much quicker than Ken or Brad. Any unfair advantage?

[+] baddox|15 years ago|reply

For anyone who understands the true dynamics of Jeopardy!, Watson simply isn't impressive as a Jeopardy! contestant as it's being made out to be. I'm not minimizing the awesomeness of Watson's language processing, because it's great, but Watson's performance on Jeopardy! is only as impressive as a hypothetical competition where Watson and two humans fill out a worksheet and compare who got the most answers right.

During tournaments of champions, Jeopardy! is not about how many correct responses you can come up with, as all competitors will know the vast majority of them. It's all about timing. I don't know the exact statistics, but it seemed like Watson knew about 75% of the correct responses. I strongly suspect the two human contestants knew a greater percentage. On day 2, it just came down to timing. Watson was only beaten to the buzzer three times when it knew the correct response.

[+] ellyagg|15 years ago|reply

It is certainly unfair and has pretty much ruined the competition for me. Watson can instantly detect when he's allowed to ring in. This is no different than a car "racing" a human. In theory, a human contestant can try to time his trigger to coincide with the end of Trebeck speaking the question, and catch the light just as the timing person releases the question, but in practice this is fraught with peril--if you're off you can't ring in for several tenths of a second--and requires much more of a human's energy, which he can no longer devote to figuring out the question. You could see Jennings attempt this on the first day, with mixed results. These episodes are commercials for IBM and it's clear who is supposed to win.

[+] zach|15 years ago|reply

Of course. This isn't a match of equals, this is one of man versus machine. As fans know, most responses in a game of Jeopardy! are known by multiple players. That's especially so in a game of this caliber, so the knowledge aspect is really quite minimal compared to the ring-in factor. Both these guys slaughtered their opponents by being quick on the buzzer.

Watson is being granted first crack at the questions 90% of the time because of its electromechanical advantage. IBM may not have the mean brainpower that Google has, but they can clearly build a computer that can press a button quicker than Ken Jennings.

Knowing that a computer can consistently beat even the best to ever play the game to the buzzer, the IBM team could be pretty well assured of success once they got Watson performing well enough.

[+] marketer|15 years ago|reply

It is an unfair advantage, and it's frustrating to watch. For most of the answers, both Ken and Brad were trying to buzz in, but Watson always had better timing and buzzed in first. I'm sure that Jeopardy's buzzing system didn't take robots into account when it was designed, so it technically isn't against the rules. But it does give Watson a huge tactical advantage.

[+] InclinedPlane|15 years ago|reply

You're glossing over the fact that Watson knew the correct answer most of the time. That's what this is about, and that's why this is a significant result.

[+] philsalesses|15 years ago|reply

Today, I learned that there are 7 cities in America named Toronto.

[+] theycallmemorty|15 years ago|reply

I wonder how many of them partially fit the mold of having an airport named after WWII battles/heroes?

[+] baddox|15 years ago|reply

And, as I predicted, it only came to buzzer reflex, which computers unsurprisingly excel at. On day 2 (today), Watson was only beaten to the buzzer three times when it had the correct response above its confidence threshold.

[+] unknown|15 years ago|reply

[deleted]

[+] baddox|15 years ago|reply

I don't think so. I would hope the clue writers were purposefully made unaware which clues were being written for the Watson episodes. I don't think they seemed unfair, either. They seemed like normal modern Jeopardy! clues.

Also, I am highly confident that if you simply had Ken, Brad, and Watson fill out a worksheet with all the clues written on it (thus bypassing the central game dynamic of buzzer reflexes), Watson would get the lowest score.

[+] Curly|15 years ago|reply

To make it a true test of brains, and remove the mechanics of button-pressing speed from the question...

Place all three contestants in isolation from each other.

All three hear the question read, and buzz-in just as they do now.

Allow ALL contestants who buzz in to answer the question, but do not allow them to know about their opponents' performances.

Record all contestants' buzz-in reaction times.

At the end of the game, compare only the accuracy of answers to determine the winner.

At the end of the game, compare buzz-in reaction times to see how thumbs fare against relays.

[+] ckwalsh|15 years ago|reply

Just put up the latest results

https://spreadsheets1.google.com/ccc?key=tth_jhM8vyBAuogqHll...

[+] nyellin|15 years ago|reply

I received a few complaints when I posted the results of round #1 and it hit the homepage. You might want to change the title to something ambiguous about who won.

[+] jamesjyu|15 years ago|reply

Part 1 of the second round on YouTube here: http://www.youtube.com/watch?v=PHhDLUVAtqU

[+] usaar333|15 years ago|reply

Thanks. Part 2 available anywhere?

[+] metaprinter|15 years ago|reply

I had to leave the article to search where the actual building was located because all they gave was, "suburban New York". I'm still not sure where it is.

[+] rockstar9|15 years ago|reply

is there a replay of jeopardy?

[+] jackowayed|15 years ago|reply

IBM will have it up in a couple days: http://twitter.com/#!/IBMWatson/status/37223337453158400

[+] philsalesses|15 years ago|reply

They are out there. Be industrious, you'll find them.

[+] mashmac2|15 years ago|reply

Normally, Jeopardy shows new shows on Weeknights.

Popular shows are reshown on weekends. Also, around holidays and other non-normal weeks of broadcasting, older shows are re-aired.

I don't believe episodes are available legally online.

[+] drstrangevibes|15 years ago|reply

the fact that watson has good nlp isnt nearly as impressive as the fact that it has a huge knowledge base, how the hell did it get all that knowledge, if it is just from browsing the internet by itself that makes me afraid.......very afraid

[+] to|15 years ago|reply

[deleted]

[+] e40|15 years ago|reply

Why the spoiler? Do you think everyone watches it live? We are in the age of the DVR.

[+] bpodgursky|15 years ago|reply

Mubarak's not president of Egypt anymore!

122 comments