top | item 20027838

Chess-playing neural network LC0 beats Stockfish in 100-game match

188 points| bonzini | 6 years ago |tcec.chessdom.com

83 comments

order
[+] ganeshkrishnan|6 years ago|reply
It's really amazing and impressive what LC0 has achieved with almost no money and only community pooled training matches. They were struggling to pay $600 per month for the cloud server so have a baremetal server.

Pretty sure it can beat AlphaZero at this point of time.

[+] umanwizard|6 years ago|reply
It's insane. Basically a hobby project and it obliterates other engines which are the result of 30+ years of computer chess research.
[+] mda|6 years ago|reply
Pretty sure Alpha zero would destroy Leela.
[+] dragontamer|6 years ago|reply
LC0 and AlphaZero work on the following principles:

* GPU-based compute -- Deep Learning scales very well on GPUs, which are known to have superior parallelism over CPUs.

* MCTS -- "Monte Carlo Tree Search", which is a different tree-search compared to Minimax. MCTS seems closer to "human-like play", with regards to search order.

IMO, it is inelegant to use CNNs / neural nets as the evaluation layer. However, its difficult to imagine an evaluation function that would scale to GPUs. I'm curious how to use the SIMD-compute model to evaluate chess positions, but it doesn't seem like very much research exists on the subject.

GPUs have significantly higher compute power than CPUs. Chess positions can largely be represented by just a few 64-bit integers (!!). (An 8x8 bitboard is 64-bits, one bitboard per piece). So the compute-density of chess-evaluation is very very high, there just aren't any algorithms that are well suited to the GPU-architecture aside from deep learning right now.

-------

I really think MCTS is the superior search tree algorithm. The only issue is the evaluation function. Its certainly cool that Deep Learning / CNNs are being used as a "universal" GPU-function (self-learn / self-train so that a large CNN can "deeply" use a ton of GPU resources to make up its own evaluation function). But as far as I'm concerned, its a crutch.

----------

EDIT: If I had oceans of free time... I'd love to see what "translating" Stockfish's algorithms (ex: backwards pawn check. Bishop moves check. Paired Bishop advantage check. Etc. etc.) to a GPU would be like.

A GPU probably could evaluate a million boards for "backwards pawns". Then evaluate the million boards for "paired bishops". Then evaluate the million boards for "knight outpost". Etc. etc. In effect: calculate the millions of positions in parallel, by lining them up and batching them together.

Stockfish's parallelism is CPU-based, with lots of if-statements. Batching these checks into individual units would remove the if-statements and potentially make it amendable to SIMD-compute.

The tree-evaluation would try to evaluate ~1-million boards at a time, to keep the GPU busy. Maybe ~1-million boards per CPU-thread (somehow self-tuning the evaluation process for an optimal mix of CPU + GPU time used).

Again: if I had mountains of free time, lol.

[+] stuartriffle|6 years ago|reply
I’m working on a brute force MCTS engine right now. It scales across SIMD (to 8x) and GPU (only CUDA so far).

https://github.com/StuartRiffle/Jaglavak

My GTX 780 can play about 1M games to completion per sec. Still under construction, but any advice is appreciated!

[+] currymj|6 years ago|reply
CNN for Go at least seems justifiable. The use of a convolution captures the fact that a configuration of stones is mostly the same configuration of stones whether or not you shift it around the board a little bit.

This is much less true for chess, although it might still be true enough.

[+] barrkel|6 years ago|reply
IMO it makes most sense to use NN for the evaluation layer. Human heuristics are unlikely to be as well tuned as a NN, and may encode unknown biases. A NN can cover the search space more dispassionately and discover relationships that don't occur to humans, and with a delicate relative balance of tradeoffs.

The evaluation of a position is to guess if it is more likely to lead to a final game outcome. The other way to determine this is to look further ahead, deeper into the search tree. The latter has significantly more information content than a heuristic. IMO it would almost certainly be a poor tradeoff of compute; the heuristic needs to be run over each possible move at each level of the search tree, so halving the cost of your heuristic is like doubling the amount of compute you use for tree evaluation. And you're talking about running heuristics over millions of boards. That's a lot of compute to leave on the table.

And I think NN are very good for boiling down large computations into a few serial matrix operations, efficiently packing in dependencies and associations. I don't think a rule-based heuristic could compete, algorithmically.

Don't forget that one of the great mysteries of human intelligence is that it works as well as it does while using neurons that operate incredibly slowly by comparison to digital computation. The can do it because NN bake in cached knowledge that permits good decisions to be made on very little data. Exactly what you want for evaluation.

[+] nwallin|6 years ago|reply
> I'm curious how to use the SIMD-compute model to evaluate chess positions, but it doesn't seem like very much research exists on the subject.

I disagree very strongly with this statement. Evaluation heuristics have been studied extensively, but the overwhelming consensus is that the simpler the heuristic, the better. If you make your heuristic worse, but fast enough to give you +1 on your search depth, you will almost always be better. So our evaluation heuristics are almost trivial because thirty years of research has demonstrated that the trivial heuristics are the best ones.

SIMD doesn't help us for a few reasons. First, because bitboards are extremely sparse. You'll have at most 8 pawns in your pawn bitboards, 1 king in your king bitboards, and only under exceedingly rare circumstances will you have more than 2 pieces in your other 8 bitboards. Under no circumstances will you have more than 32 1s in your twelve bitboards, a density of one 1 per 24 zeroes.

Second, you have a lot of loops with depths determined at runtime. You won't know how many squares your bishop can move until you're evaluating the position. SIMD isn't very good at that.

> I really think MCTS is the superior search tree algorithm. The only issue is the evaluation function.

The issue is move ordering. In minimax + alpha-beta pruning + PVS + hash table move cache, you don't need a near-perfect heuristic to tell you which order to evaluate moves. If there are 20 legal moves, it's fine if the best move is the 4th-8th move you search, although alpha-beta is faster if you search the best move first, which is why PVS+hash table is such a strong improvement.

MCTS requires a good heuristic. If you have 20 legal moves, MCTS will evaluate maybe 2 of them, and if they're the 4-8th best moves, your AI will always be terrible, always. PVS+hash tables can't help you, because the only thing they can tell you was what move was good in the past.

There isn't a good way to sit down and get a human to write a good heuristic. NNs are really good at it though. IMHO, if you want MCTS, you either need NNs or something like it. There's no way to jury-rig classical evaluation functions into it. (by "classical evaluation functions" I mean everything from Deep Blue to Stockfish) Replacing NNs in this context will likely mean replacing NNs in every other context. This would be a good thing, a great thing even, but it would require a new paradigm in the AI/ML world, not just something unique to chess.

I actually agree with you about the elegance of MCTS and bitboards, and the inelegance of NNs and minimax. (and all of minimax's accompanying optimizations) But there's not really a good way forward.

[+] mot524|6 years ago|reply
The problem is that tree searching is largely serial, in the sense of, you don't know the next position you want to search until you finish searching the previous position. So you can't just send a million positions to a GPU to evaluate in parallel because you have no idea which positions you'll want to evaluate.

So GPUs are pretty worthless for computer chess unless your algorithm evaluates positions with a deep CNN (which do run well on GPUs). It's not a question of how much time people have spent to investigate running chess algorithms on GPUs, it's a question of what kind of algorithms are better suited to CPUs vs. GPUs.

[+] dagw|6 years ago|reply
I'd love to see what "translating" Stockfish's algorithms (ex: backwards pawn check. Bishop moves check. Paired Bishop advantage check. Etc. etc.)

Do you know if all these checks are documented somewhere (other than the source code)?

[+] mrob|6 years ago|reply
FIDE Candidate Master Kingscrusher (Tryfon Gavriel) has been posting analysis videos for some of these games on Youtube. LC0's win as black against the Trompowsky Attack was particularly impressive:

https://www.youtube.com/watch?v=rz51_wFgNrQ

[+] ganeshkrishnan|6 years ago|reply
I have been following kingscrusher for years and I never knew his real name.
[+] joe_the_user|6 years ago|reply
I cannot figure out what exactly is happening from the site. I see a table showing "Gull 3" (presumably using the LC0 enginer) with five wins and few other engines with 1-2 losses.

I can't see draws on the table. A lot of engines show no wins or losses.

All this matters somewhat considering that there was talk in AlphaZero match about whether the win percentage was realistic and whether the Stockfish was given it's full opening book. The thing is by the time you're 3000+ elo points, chess tends to be pretty drawish. One program might better than another but it can take while to get anything a draw.

[+] ngcc_hk|6 years ago|reply
Some of the discussion seems forget that the lc0 use neural network. There is essential no logic and there is no mtcs in the core part. Just play.

And it is called zero ad it does not read any human game, like alpha go zero.

It is just play and play to train the neural network ... you may say it want to find a function by itself that play the game. And the function is in the weights.

[+] tim333|6 years ago|reply
Anyone know how the LC0 vs Stockfish hardware compares? I believe a criticism of the Stockfish vs AlphaZero games was that AlphaZero had fancier hardware.
[+] mot524|6 years ago|reply
I believe LC0 was being run on dual RTX 2080 Tis and Stockfish was being run on "43 cores" which I take to mean a workstation with dual Intel 22-core processors. So you're talking about two $1200 graphics cards vs. two $3000+ CPUs. So you could make an argument that LC0 was running on inferior hardware, at least dollar-wise.
[+] bonzini|6 years ago|reply
53-46, one game to go
[+] Someone|6 years ago|reply
Last one is a draw. Weird ending, with one of them refusing to take a pawn because the resulting end game is in a table base and is 100% known to be a draw, and the computer evaluates the position where it doesn’t take the pawn as a 0.x% win and a 99.y% draw.