top | item 11271816

AlphaGo beats Lee Sedol 3-0 [video]

566 points| Fede_V | 10 years ago |youtube.com

407 comments

order
[+] Radim|10 years ago|reply
In a recent interview [1], Hassabis (DeepMind founder) said they'd try training AlphaGo from scratch next, so it learns from first principles. Without the bootstrapping step of "learn from a database of human games", which introduce human prejudice.

As a Go player, I'm really excited to see what kind of play will come from that!

[1] http://www.theverge.com/2016/3/10/11192774/demis-hassabis-in...

[+] kurlberg|10 years ago|reply
A case of life imitating AI koans:

Uncarved block

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied. "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.

(It seems based on a true story https://en.wikipedia.org/wiki/Hacker_koan )

[+] awwducks|10 years ago|reply
That would be amazing if it could achieve the same levels (or higher) without the bootstrapping.

The niggling thought in my mind was that AlphaGo's strength is built on human strength.

[+] kotach|10 years ago|reply
I believe the whole point of pretraining on reference policies, which a collection of "optimally" played human games is, is just avoidance of bad local optimum.

It can be a case that training and learning on just a learned policy is going to get you stuck in a local optimum that is of worse quality than the one with pretraining.

If they stored all of the AI played games their reference policy (the data) would be of extreme value. You could train a recurrent neural network, without any reinforcement learning, that you could probably run on a smartphone and beat all of the players. You wouldn't need a monte carlo search too.

There are algorithms [1] that have mathematical guarantees of achieving local optimality from reference policies that might not be optimal, and can even work better than the reference policy (experimentally) - assuming that the reference policy isn't optimal. The RNN trained with LOLS would make jointly local decisions over the whole game and each decision would guarantee that a minimization of future regret is being done. Local optimality mentioned here isn't finding a locally optimal model that approximates the strong reference policy, it means that it will find the locally optimal decisions (which piece to put where) without the need for search.

The problem is that for these algorithms you have to have a closely good reference policy, and given a small amount of human played Go games, reinforcement learning was the main algorithm instead, it allowed them to construct a huge number of meaningful games, from which their system learned, which allowed them to construct a huge number of more meaningful games, etc.

But, now when they have games that have a pretty good (AlphaGo is definitely playing on a superhuman level) reference policy, they can train the model based on that reference policy and they wouldn't need a search part of the algorithm at all.

The model would try to approximate the reference policy and would definitely be worse than AlphaGo real-search based policy, but it wouldn't be significantly worse (mathematical guarantee). The model is trained starting from a good player, and it tries to approximate the good player, on the other hand, reinforcement learning starts from an idiot player, and tries to become a good player, reinforcement learning is thus much much harder.

[1]: http://www.umiacs.umd.edu/~hal/docs/daume15lols.pdf

[+] wagglycocks|10 years ago|reply
I feel like an ant in the presence of giants.
[+] awwducks|10 years ago|reply
Perhaps the last big question was whether AlphaGo could play ko positions. AlphaGo played quite well in that ko fight and furthermore, even played away from the ko fight allowing Lee Sedol to play twice in the area.

I definitely did not expect that.

Major credit to Lee Sedol for toughing that out and playing as long as he did. It was dramatic to watch as he played a bunch of his moves with only 1 or 2 seconds left on the clock.

[+] bronz|10 years ago|reply
Once again, I am so glad that I caught this on the live-stream because it will be in the history books. The implications of these games are absolutely tremendous. Consider GO: it is a game of sophisticated intuition. We have arguably created something that beats the human brain in its own arena, although the brain and AlphaGO do not use the same underlying mechanisms. And this is the supervised model. Once unsupervised learning begins to blossom we will witness something that is as significant as the emergence of life itself.
[+] danmaz74|10 years ago|reply
> Consider GO: it is a game of sophisticated intuition

It's still a game that can be described in terms of clear state-machine rules. The real challenge for AI is making sense and acting in the real world, which can't be described in such way. I consider advances in self-driving cars much more interesting in that sense - even if, even there, there are at least some rule-based constraints that can be applied to simplify the representation of the "world state".

[+] codecamper|10 years ago|reply
Yes. We will witness the birth of Marvin the Android.
[+] pushrax|10 years ago|reply
It's important to remember that this is an accomplishment of humanity, not a defeat. By constructing this AI, we are simply creating another tool for advancing our state of being.

(or something like that)

[+] wnkrshm|10 years ago|reply
While he may not be number one in the Go rankings afaik, Lee Sedol will be the name in the history books: Deep Blue against Garry Kasparov, AlphaGo against Lee Sedol. Lots of respect to Sedol for toughing it out.
[+] Yuioup|10 years ago|reply
I really like the moments when Alpha-Go would play a move and the commentators would look stunned and go silent for a 1-2 seconds. "That was an unexpected move", they would say.
[+] starshadowx2|10 years ago|reply
In game 2 there was a point where Michael Redmond seemed to do a triple take and couldn't believe the move AlphaGo played.
[+] flyingbutter|10 years ago|reply
The Chinese 9 Dan player Ke Jie basically said the game is lost after around 40 mins or so. He still thinks that he has a 60% chance of winning against AlphaGo (down from 100% on day one). But I doubt Google will bother to go to China and challenge him.
[+] jamornh|10 years ago|reply
Based on all the commentaries, it seems that Lee Sedol was really not ahead during the game at any point during the game... and I think everybody has their answer regarding whether AlphaGo can perform in a Ko fight. That's a yes.
[+] kybernetikos|10 years ago|reply
Go was the last perfect information game I knew where the best humans outperformed the best computers. Anyone know any others? Are all perfect information games lost at this point? Can we design one to keep us winning?
[+] bainsfather|10 years ago|reply
It is interesting how fast this has happened compared to chess.

In 1978 chess IM David Levy won a 6 match series 4.5-1.5 - he was better than the machine, but the machine gave him a good game (the game he lost was when he tried to take it on in a tactical game, where the machine proved stronger). It took until 1996/7 for computers to match and surpass the human world champion.

I'd say the difference was that for chess, the algorithm was known (minimax + alpha-beta search) and it was computing power that was lacking - we had to wait for Moore's law to do its work. For go, the algorithm (MCTS + good neural nets + reinforcement learning) was lacking, but the computing power was already available.

[+] partycoder|10 years ago|reply
Some professionals labeled some AlphaGo moves as being unoptimal or slow. In reality, Alpha Go doesn't try to maximize its score, only its probability of winning.
[+] niuzeta|10 years ago|reply
Impressive work by Google research team. I'm both impressed and scared.

This is our Deep Blue moment folks. a history is made.

[+] esturk|10 years ago|reply
Give credit where credit is due. This is DeepMind's research, and Google acquired them in 2014. Of course, Google gave them a lot of resources to train AlphaGo. But let's not bury the subsidiary, or I guess they would both be subs to Alphabet now making the two even less related.
[+] yulunli|10 years ago|reply
This seems to be more impressive than the Deep Blue moment. In 1996, Deep Blue didn't make it on the first try. Even in 1997, it has been a draw until the 6th game. Although AlphaGo has 2 games to go, the first three seem to be a clear victory.
[+] tzs|10 years ago|reply
Hopefully Google does not do what IBM did and stop playing after they win.

IBM lost in 1996, 2-4, and then won in 1997, 3.5-2.5. If they had played a third match with Kasparov, especially a longer match, it is not at all clear that they would have won.

Kasparov asked for a third match of 10 games, to be played over 20 days, but IBM would not give it to him.

[+] atrudeau|10 years ago|reply
It would be nice if AlphaGo emitted the estimated probability of it winning every time a move is made. I wonder what this curve looks like. I would imagine mistakes by the human opponent would give nice little jumps in the curve. If the commentary is correct, we would expect very high probability 40-60 minutes into the game. Perhaps something crushing, like 99,9%
[+] Symmetry|10 years ago|reply
It did, apparently. I wish that information was public.

http://www.nature.com/news/the-go-files-ai-computer-wins-fir...

For me, the key moment came when I saw Hassabis passing his iPhone to other Google executives in our VIP room, some three hours into the game. From their smiles, you knew straight away that they were pretty sure they were winning – although the experts providing the live public commentary on the match that was broadcast to our room weren’t clear on the matter, and remained confused up to the end of the game just before Lee resigned. (I'm told that other high-level commentators did see the writing on the wall, however).

[+] skarist|10 years ago|reply
We are indeed witnessing and living a historic moment. It is difficult not to feel awestruck. Likewise, it is difficult not to feel awestruck at how a wet 1.5 kg clump of carbon-based material (e.g. Lee Sedol brain) can achieve this level of mastery of a board game, that it takes such an insane amount of computing power to beat it. So, finally we do have a measure of the computing power required to play Go at the professional level. And it is immense, or to apply a very crude approximation based on Moore's law, it requires about 4096 times more computing power to play Go at the professional level than it does to play chess. Ok, this approx may be a bit crude :)

But maybe this is all just human prejudice... i.e. what this really goes to show is that in the final analysis all board games we humans have inveted and played are "trival", i.e. they are all just like tic-tac-toe just with a varying degree of complexity.

[+] Eliezer|10 years ago|reply
My (long) commentary here:

https://www.facebook.com/yudkowsky/posts/10154018209759228

Sample:

At this point it seems likely that Sedol is actually far outclassed by a superhuman player. The suspicion is that since AlphaGo plays purely for probability of long-term victory rather than playing for points, the fight against Sedol generates boards that can falsely appear to a human to be balanced even as Sedol's probability of victory diminishes. The 8p and 9p pros who analyzed games 1 and 2 and thought the flow of a seemingly Sedol-favoring game 'eventually' shifted to AlphaGo later, may simply have failed to read the board's true state. The reality may be a slow, steady diminishment of Sedol's win probability as the game goes on and Sedol makes subtly imperfect moves that humans think result in even-looking boards...

The case of AlphaGo is a helpful concrete illustration of these concepts [from AI alignment theory]...

Edge instantiation. Extremely optimized strategies often look to us like 'weird' edges of the possibility space, and may throw away what we think of as 'typical' features of a solution. In many different kinds of optimization problem, the maximizing solution will lie at a vertex of the possibility space (a corner, an edge-case). In the case of AlphaGo, an extremely optimized strategy seems to have thrown away the 'typical' production of a visible point lead that characterizes human play...

[+] dwaltrip|10 years ago|reply
AlphaGo won solidly by all accounts. This is an incredible moment. We are now in the post-humanity go era.

The one solace was that Lee Sedol got his ko =) however, AlphaGo was up to the task and handled it well.

[+] seanwilson|10 years ago|reply
Super interesting to watch this unfold. So what game should AI tackle next? I've heard imperfect information games are harder for AI...would the AlphaGo approach not work well for these?
[+] abecedarius|10 years ago|reply
One perfect-information game that's at least as hard: constructive mathematics. (Proof assistants even give it a sort of videogame-style UI.) I've been wondering about some kind of neural net for ranking the 'moves' coupled with the usual proof search.
[+] vamur|10 years ago|reply
If they make a decent AI for Total War I'd be impressed. Or an AI in Civilization that doesn't cheat. But of course requiring a mini-datacenter in hardware and time to think (probably seconds) would make it unrealistic for both.
[+] vok|10 years ago|reply
TIS-100
[+] partycoder|10 years ago|reply
I don't think Ke Jie would win against Alpha Go either.
[+] dkopi|10 years ago|reply
One can only hope that in the final battle between the remaining humans and the robots, it won't be a game of Go that decides the fate of humanity.