Grandmaster-Level Chess Without Search

[+] paulddraper|2 years ago|reply

There is rampant misunderstanding of some parts of this article; allow me to help :)

The "no-search" chess engine uses search (Stockfish) in in two ways:

1. To score positions in the training data. This is only training data, no search is performed when actually playing.

2. To play moves when the position has many options with a 99% win rate. This is to prevent pathological behavior in already won positions, and is not "meaningful" in the objective of grandmaster-level play.

Thus, "without search" is a valid description.

[+] janalsncm|2 years ago|reply

In one sense, I can understand why they would choose to use Stockfish in mate-in-N positions. The fact that the model can't distinguish between mate in 5 and mate in 3 is an implementation detail. Since the vast majority of positions are not known to be wins or draws, it's still an interesting finding.

However, in reality all positions are actually wins (for black or white) or draws. One reason they gave for why stockfish is needed to finish the game is because their evaluation function is imperfect, which is also an notable result.

[+] joe_the_user|2 years ago|reply

Sort of but it seems a bit of a cheat.

Neural networks are universal approximators. Choose a function and get enough data from it and you can approximate it very closely and maybe exactly. If initially creating function F required algorithm Y ("search" or whatever), you can do your approximation to F and then say "Look F without Y" and for all we know, the approximation might be doing things internally that are actually nearly identical to the initial F.

[+] zniturah|2 years ago|reply

"Aready won position" or "99% win rate" is statistics given by Stockfish (or professional chess player). It is weird to assume that the same statement is true for the trained LLM since we are assessing the LLM itself. If it is using during the game then it is searching, thus the title doesn't reflect the actual work.

[+] YeGoblynQueenne|2 years ago|reply

>> 1. To score positions in the training data. This is only training data, no search is performed when actually playing.

That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens. It's completely meaningless and a big fat lie to boot.

The truth is that the system created by DeepMind consists of two components: a search-based system used to annotate a dataset of moves and a neural-net based system that generates moves similar to the ones in the dataset. DeepMind arbitrarily draw the boundary of the system around the neural net component and pretend that because the search is external to the neural net, the neural net doesn't need the search.

And yet, without the search there is no dataset, and without the dataset there is no model. They didn't train their system by self-play and they certainly didn't hire an army of low-paid workers to annotate moves for them. They generated training moves with a search-based system and learned to reproduce them. They used chickens to make eggs.

Their approach depends entirely on there being a powerful chess search engine and they wouldn't be able to create their system without it as a main component. Their "without search" claim is just a marketing term.

[+] ad8e|2 years ago|reply

A quote from discord: "apparently alpha-zero has been replicated in open source as leela-zero, and then leela-zero got a bunch of improvements so it's far ahead of alpha-zero. but leela-zero was barely mentioned at all in the paper; it was only dismissed in the introduction and not compared in the benchmarks. in the stockfish discord they are saying that leela zero can already do everything in this paper including using the transformer architecture."

[+] verteu|2 years ago|reply

How much of this "grandmaster-level" play is an artifact of low time controls? I notice they only achieve GM ELO in Blitz against humans, achieve significantly worse ELO against bots, and do not provide the "Lichess Blitz ELO" of any of their benchmark approaches.

[+] bjourne|2 years ago|reply

All of it? Inference speed is proportional to the number of parameters which can't vary depending on the time control. For longer time controls you'd need a larger network. For those who don't know chess, the quality of high-level play in five-minutes-each games is extremely much lower than in 90-minutes-each games.

[+] actionfromafar|2 years ago|reply

I wonder if one could make a neural net play human-like, at various levels, by for instance training smaller or larger nets. And by human-like, I don't mean ELO level, but more like the Turing tests - "does this feel like playing against a human?"

I wonder how many time-annotated chess play logs are out there. (Between humans, I mean.)

[+] tromp|2 years ago|reply

While its performance against humans is very impressive indeed, its performance against engines is somewhat less so:

> Our agent’s aggressive style is highly successful against human opponents and achieves a grandmasterlevel Lichess Elo of 2895. However, we ran another instance of the bot and allowed other engines to play it. Its estimated Elo was far lower, i.e., 2299. Its aggressive playing style does not work as well against engines that are adept at tactical calculations, particularly when there is a tactical refutation to a suboptimal move.

[+] chongli|2 years ago|reply

I like this even more that I’ve read that. That sounds like it makes this agent a much more human-like player than the perfect calculator traditional chess engines. It may end up being more fun for humans to play against if it’s strong but has holes in its play.

[+] aexl|2 years ago|reply

This sounds a lot like Mikhail Tal!

[+] famouswaffles|2 years ago|reply

Well without explicit search would probably be more accurate.

They note that though in the paper:

>Since transformers may learn to roll out iterative computation (which arises in search) across layers, deeper networks may hold the potential for deeper unrolls.

[+] janalsncm|2 years ago|reply

We don’t know if it’s using implicit search either. While it would be interesting if the network was doing some internal search, it’s also possible it has just memorized the evaluations from 10M games and is performing some function of the similarity of the input to those previously seen.

[+] fiforpg|2 years ago|reply

Given that they used position evaluation from (a search chess engine[1]) Stockfish, how is this "without search"?

Edit: looking further than the abstract, this is rather an exploration of scale necessary for a strong engine. Could go without "without search" in the title I guess.

[1]: IIRC, it also uses a Leela-inspired NN for evaluation.

[+] throwaway81523|2 years ago|reply

Leela without search supposedly plays around expert level, but I thought the no-search Leela approach ran out of gas around there. Without search there means evaluating 1 board position per move. The engine in the paper (per the abstract) use a big LLM instead of a Leela style DCNN.

[+] paulddraper|2 years ago|reply

Training uses search, but it plays without search.

ChatGPT isn't human, but it was trained with humans.

[+] unknown|2 years ago|reply

[deleted]

[+] porphyra|2 years ago|reply

Does Stockfish really use a Leela-inspired NN? I thought the NNUE was independently developed and completely different (it's a very tiny network that runs on the CPU).

[+] IIAOPSW|2 years ago|reply

Slightly off topic but am I the only one that approaches strategy games by making a "zeroth order approximation". Eg find the shortest path to victory under the (obviously faulty) assumption that my opponent does nothing and the board is unchanging except for my moves. Now find my opponents shortest path to victory under the same assumption. Then evaluate, if we both just ignore each other and try to bum rush the victory condition, who gets there first?

For most games, if you can see a way to an end state within 3-5 steps under these idealized conditions, there's only so much that an actual opponent can do to make the board deviate from the initial static board state that you used in your assumption. The optimal strategy will always be just a few minor corrections of edit distance from this dumb no-theory-of-mind strategy. You can always be sure that whoever has the longer path to victory has to do something to interfere with the shorter path of their opponent, and there's only ever so many pieces which can interact with that shorter path. Meaning whatever path to victory is currently shortest short circuits the search for potential moves.

[+] dayjaby|2 years ago|reply

On beginner level this might work, but if people are more competitive they begin to realize the benefit of not only playing the own game, but reading the enemies plan (e.g. scouting in Starcraft/AoE2) to counteract it as much as possible.

Chess against humen is different. Usually, there is no path to victory, only to remis. People just follow strategic plans that people told them would be slightly beneficial later on. Along following that strategic plan people mess up and the first one to realize that the opponent messed up usually wins. Like having a piece advantage of 2-3 is usually considered a win already.

[+] AndrewPGameDev|2 years ago|reply

If you programmed this as a chess strategy, it would probably result in an engine that played the Scholar's mate every game. This is actually close to what low Elo players do in chess, but as you get closer to 800-ish ELO the probability of attempted scholar mates drop dramatically (likely due to it being an opening that isn't that good).

[+] themoonisachees|2 years ago|reply

This is one of the canonical ways people learn chess. It's not that bad of a way to play because it emphasizes thinking about good moves, and it efficiently finds mates in N (when done by a human)

In higher level play it usually loses to opponents that are aware they're not playing alone, at least that's the case with bots that do in fact stay unaware of their opponent.

[+] bionsystem|2 years ago|reply

A lot of decision in chess would be like "this square would be nice for that piece, how can I get there ?" and then analyze what your opponent can do to prevent you to do that, or what counterplay that gives him. So what you are doing makes a lot of sense.

[+] isaacfrond|2 years ago|reply

It's called the Null-move heuristic [1]. The assumption being that if a move can't win even if the opponent doesn't respond to it, then probably it is a pretty bad move. It's pretty standard in traditional chess search engine. Though I don't think AlphaZero's monte carlo search uses it.

[1]: https://en.wikipedia.org/wiki/Null-move_heuristic

[+] BlindEyeHalo|2 years ago|reply

This would never work in chess because there is almost always a checkmate in a few moves under the assumption that the opponent doesn't defend it. So this engine would just play way too aggressively and over-extent their pieces.

Take scholars mate for example. You can win in 4 moves from the initial position and it is an easy win if the opponent doesn't defend it but playing against someone that knows chess it is a horrible opening because it is easy to defend and leaves you in a weak position.

[+] xbpx|2 years ago|reply

That's why white has a higher statistical win rate ya?

[+] sebzim4500|2 years ago|reply

Can't you almost always win the game in a few moves if you plan for your opponent to make really stupid responses?

[+] jncfhnb|2 years ago|reply

Ehh idk. Sounds like it’s prone to the beginner strategy of assuming your opponent will occasionally do something really dumb.

[+] iamcreasy|2 years ago|reply

I guess this current method is not applicable to have the model explain why a given move was played as it is not planning more than one more ahead. Very cool, nonetheless.

[+] salamo|2 years ago|reply

I think this is an interesting finding from a practical perspective. A function which can reliably approximate stockfish at a certain depth could replace it, basically "compressing" search to a set depth. And unlike NNUE which is optimized for CPU, a neural network is highly parallelizable on GPU meaning you could send all possible future positions (at depth N) through the network and use the results for a primitive tree search.

[+] sp332|2 years ago|reply

The Stockfish installer is ~45 MB. At 16 bits per parameter, the 270B model would be over 500 MB. The 9B model would be smaller than Stockfish, but you could probably find a smaller chess engine that achieves 2000 ELO.

[+] zniturah|2 years ago|reply

They do use Stockfish for playing thought …

“To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.”

[+] n2d4|2 years ago|reply

The context of that sentence:

> Indecisiveness in the face of overwhelming victory

> If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.

[+] paulddraper|2 years ago|reply

But only to complete a winning position.

[+] bjourne|2 years ago|reply

That must mean they found some similarity metric for high-level chess which is very impressive. In chess one pawn moving one square can be the difference between a won and a lost position. But knowing that usually requires lots of calculation.

[+] paradite|2 years ago|reply

Nice. I build AI-driven puzzle games where you can pick algorithms, tune parameters, or train ML models to play the game.

Chess is on my priority list for the next game. Now I know that ML strategy is on par with search algorithms.

[+] im3w1l|2 years ago|reply

> I know that ML strategy is on par with search algorithms

It's on par with very strong humans, but not as good as a normal engine.

[+] mk_stjames|2 years ago|reply

270M parameters at I guess FP/BF16?

I'd be interested to see how well the ELO holds up when the model is quantized.

At INT8 a small transformer like this could have a pretty amazing speed and efficiency on an Edge TPU or other very low power accelerator chip. The question becomes then is it faster / more efficient than Stockfish 16 on a similarly powered CPU. As we've seen with LLM's, they can be extremely speedy when quantized and all the stops pulled out on hardware to efficiently infer them compared to the raw FP16 and naive implementations.

[+] asah|2 years ago|reply

imho fascinating experiment, even if it didn't produce world class Elo.

For one thing, statelessness deprives it of easy solutions to repetition and endgame decisiveness.

[+] xianshou|2 years ago|reply

The path to AGI:

0. Have model A.

1. Use Monte Carlo with A to get supervised data.

2. Train model B with data from A.

3. Use Monte Carlo with B to get supervised data.

4. Train model C with data from B...

[+] rprenger|2 years ago|reply

This is pretty close to how AlphaZero works.

https://medium.com/applied-data-science/alphago-zero-explain...

[+] jedberg|2 years ago|reply

That's basically how OpenAI is working. They use generated training sets from one model to train the next model (plus other stuff with it).

But the "other stuff" is pretty important. That is what pulls it away from just constantly re-amplifying the bias in the initial training data.

[+] spencerchubb|2 years ago|reply

That is an awesome idea. I wish the authors would open source the code and weights so this can be tried.

[+] lettergram|2 years ago|reply

I immediately saw this and knew it was BS. It’s a search problem, the human brain even does a search. Model internally is scanning each position and determining the next probably position. That is a predictive search, you can’t just restructure the problem.

Now arguably it’s doing it differently, maybe? But still a search

[+] penjelly|2 years ago|reply

i dont follow... even if its trained anc doesnt use search isnt the act of it deciding the next move a sortof search anyway based off its training? Ive heard people describe LLMs as extremely broad search, basically attempting to build world model and then predicting the next world based on that. Is this fundamentally different from search? Am i wrong in my assumptions here?

[+] janalsncm|2 years ago|reply

We know the model is approximating the results of a search. We don’t know whether it is actually searching.

At the most basic level, the model is just giving probabilities for the next moves, or in the case of value approximation, guessing which bucket the value falls into.

[+] EGreg|2 years ago|reply

Next up, claims of ...

  Text generation without human writers
  Image generation without human artists
  Video generation without production crews

The rub is that all the training data has been done by lots and lots of search, human writing, human artists, and human production crews. At every step, they curated the results and generated a beautiful latent space that the model was then trained on.

The impressive part about AlphaZero is that it did the Monte-Carlo Tree Search itself, without being trained on human decision-making.

While this ... well, this is just taking credit for all the search and work that was done by Stockfish, or humans, at a massive scale over centuries and then posted nearly for free online. It's then using all that data and generating similar stuff. Whoop de doo. Oh, and it's even used cheap labor for the last mile, too:

https://time.com/6247678/openai-chatgpt-kenya-workers/

It's not the same thing as actual search to, for example, automatically derive scientific laws (e.g. Kepler's laws of motion) from raw data fed to it (e.g. of star movements). AI doing that can actually model the real space, not the latent space. It can go out and learn without humans or stockfish massively bootstrapping its knowledge.

I mean, don't get me wrong ... learning a lot about the latent space is what students strive to do in schools and universities, and the AIs are like a very smart student. In fact, they can be huge polymaths and polyglots and therefore uncover a lot of interesting connections and logical deductions from the latent space. They can do so at a huge scale... and I have often said that swarms of AIs will be unstoppable. So at the end of the day, although this isn't very impressive when it comes to the credit of who did the search and curation, AI is going to be extremely impressive with what it can do with the results on the next N levels.

[+] mtlmtlmtlmtl|2 years ago|reply

Grandmaster level chess for a computer is a massive, massive regression. Big whoop.

[+] ivanjermakov|2 years ago|reply

I thought clickbaity titles were discouraged in whitepaper world.

130 comments