top | item 14536610

Can Neural Networks Crack Sudoku?

141 points| kyubyong | 8 years ago |github.com | reply

105 comments

order
[+] timdellinger|8 years ago|reply
To me, this answers the question "How well can neural networks crack sudoku if we forget all of our domain knowledge, and just try to brute force our way through with insufficient training data and a neural network that's poorly sized?".

I suggest an alternate approach that would use a training set consisting of real-world data of sudoku puzzles that are in the process of being solved: before a box is filled in, and then after it's filled in. This would teach the network to solve the puzzle one step at a time, instead of all at once.

If you insist on only using the as-received puzzle and the solution for training, my intuition is that you'll need more layers than 10 in the NN, and much much more data.

A taxonomy of what complex solving strategies need to be learned already exists: see e.g. http://www.sudokuwiki.org/y_wing_strategy and the categories "basic", "tough", diabolical", and "extreme". My guess is that the NN (using the original author's strategy, but with more layers and lots more data) will eventually do well on "basic" and "tough", but will start to miss on some of the latter conditions depending on the training set and how the network is set up.

[+] theptip|8 years ago|reply
To me, it's an interesting question to ask how well NNs can perform without hand-holding, i.e. can they figure out all of the complex strategies that humans have figured out, starting from a blank slate?

C.f. AlphaGo, which invented new strategies that proved interesting to human players.

Of course, if you're training your NN to solve a real-world problem, you wouldn't choose to train it in this way, you would do as you suggested and feed it as much relevant training data as possible, to give it a head-start.

(Your points RE: sizing and quantity of training data sound reasonable and are out of my expertise, so nothings to add there).

[+] nojvek|8 years ago|reply
I just feel bad about this "my intuition is that you'll need more layers than 10 in the NN, and much much more data."

I still find current AI as basically a filter. Image to text, speech to text e.tc

A smart AI would be able to figure out sudoku rules from a very small sample of games, it would then figure out a rudimentary backtracking algorithm. It would then sleep over it and figure out more patterns like naked singlets e.t.c and build a really fast sudoku solver with 100% accuracy. All this happening in a matter of seconds.

[+] iaw|8 years ago|reply
I suspect there may be a method by which one could convert solved sudoku's into numerous candidate puzzles.

I imagine strategically removing numbers from solved puzzles so as to reinforce the neural connections for solution and filter out possible 'noise'

[+] dgfgfdagasdfgfa|8 years ago|reply
Well, sudoku is a solved problem. This would amount to teaching a neural net graph coloring.
[+] kortex|8 years ago|reply
So much missing-the-point in this thread. The point isn't "can we solve sudoku with a completely over-wrought solution", it's "can NN be applied to X class of problems with no human-added domain specific knowledge". And that I think is quite cool.
[+] ska|8 years ago|reply

   "can NN be applied to X class of problems with no human-added domain specific knowledge"
Universal approximation theory suggest that for a significant class of problems, the answer to this is "obviously" [1]. The problem that remains is how effective is the learning.

I'm not convinced applying them to areas like this one where the are clearly much better approaches teaches us anything useful for more interesting cases, but maybe it does.

[1] NB it's not immediately clear that OPs problem is in that class.

[+] petters|8 years ago|reply
I agree. I was a bit surprised that this approach worked as well as it did.

(I still think it would fail spectacularly on "super-human" sudokus.)

[+] jcoffland|8 years ago|reply
> The point isn't "can we solve sudoku with a completely over-wrought solution", it's "can NN be applied to X class of problems with no human-added domain specific knowledge"

But why would you care about solving a class of problems with NNs when that class of problems already has much better solutions? Too many new programmers are running to NNs out of pure laziness. NNs are like magic. They solve the problem for you so you don't have to learn how to solve it yourself. Or at least that's the enticing promise.

[+] majewsky|8 years ago|reply
Evidence that DNNs are somewhere between phase 2 and 3 of https://en.wikipedia.org/wiki/Hype_cycle
[+] dahart|8 years ago|reply
I'm sometimes cynical about NNs too, but even if I were to take the hype cycle theory at face value, I have to admit it's possible that NNs are reaching maturity ("plateau of productivity") now, given that there were large hype cycles over them in the 1960s and 1980s, and given that they are starting to really work and are being deployed in large scale applications. But, this hype cycle theory isn't scientific, so all supporting evidence is anecdotal and subjective, right?
[+] wiredfool|8 years ago|reply
Soduku is trivially directly solvable for puzzles that are not in the hard/ultra hard level by propagating constraints.

For cases where that's not possible, iterative approaches can be very effective -- with not particularly optimized python, the worst solution time I ever saw was 1 sec on a 1st gen eeepc netbook. That had about 6 layers of iterative backtracking before it came up with the solution.

[+] jcoffland|8 years ago|reply
I wrote a Soduku solver in Python while waiting in the airport on a long layover that solves even the most difficult puzzles in a few seconds on an now old laptop. It requires some backtracking but it's really not that hard.

Edit: I decided to publish my implementation on GitHub. https://github.com/jcoffland/fsudoku

[+] brian_herman|8 years ago|reply
You can solve sudoku with a SAT solver. You don't need neural networks, this was an assignment in cs 251 at UIC.
[+] zmonx|8 years ago|reply
Indeed! Here is Sudoku, formulated as a SAT problem. You can use for example SICStus Prolog to run it:

    sudoku(Rows) :-
            length(Rows, 9), maplist(same_length(Rows), Rows),
            maplist(row_booleans, Rows, BRows),
            maplist(booleans_distinct, BRows),
            transpose(BRows, BColumns),
            maplist(booleans_distinct, BColumns),
            BRows = [As,Bs,Cs,Ds,Es,Fs,Gs,Hs,Is],
            blocks(As, Bs, Cs), blocks(Ds, Es, Fs), blocks(Gs, Hs, Is).

    blocks([], [], []).
    blocks([N1,N2,N3|Ns1], [N4,N5,N6|Ns2], [N7,N8,N9|Ns3]) :-
            booleans_distinct([N1,N2,N3,N4,N5,N6,N7,N8,N9]),
            blocks(Ns1, Ns2, Ns3).

    booleans_distinct(Bs) :-
            transpose(Bs, Ts),
            maplist(card1, Ts).

    card1(Bs) :- sat(card([1],Bs)).

    row_booleans(Row, Bs) :-
            same_length(Row, Bs),
            maplist(cell_boolean, Row, Bs).

    cell_boolean(Num, Bs) :-
            length(Bs, 9),
            sat(card([1],Bs)),
            element(Num, Bs, 1).
This uses the sat/1 constraint to denote Boolean satisfiability.

Here is an example Soduku problem, taken from Donald Knuth's The Art of Computer Programming:

    knuth(b, [[1,_,3,_,5,6,_,8,9],
              [5,9,7,3,8,_,6,1,_],
              [6,8,_,1,_,9,3,_,5],
              [9,5,6,_,3,1,8,_,7],
              [_,3,1,5,_,8,9,6,_],
              [2,_,8,9,6,_,1,5,3],
              [8,_,9,6,_,5,_,3,1],
              [_,6,5,_,1,3,2,9,8],
              [3,1,_,8,9,_,5,_,6]]).
Sample query and result:

    ?- knuth(b, Rows),
       sudoku(Rows),
       maplist(portray_clause, Rows).
    [1, 2, 3, 4, 5, 6, 7, 8, 9].
    [5, 9, 7, 3, 8, 2, 6, 1, 4].
    [6, 8, 4, 1, 7, 9, 3, 2, 5].
    [9, 5, 6, 2, 3, 1, 8, 4, 7].
    [7, 3, 1, 5, 4, 8, 9, 6, 2].
    [2, 4, 8, 9, 6, 7, 1, 5, 3].
    [8, 7, 9, 6, 2, 5, 4, 3, 1].
    [4, 6, 5, 7, 1, 3, 2, 9, 8].
    [3, 1, 2, 8, 9, 4, 5, 7, 6].
[+] tchalla|8 years ago|reply
SAT - Satisfiability

https://en.wikipedia.org/wiki/Boolean_satisfiability_problem

Acronyms are always a pet peeve of mine. I would know what a SAT is - since I've spent more than 50% of my life with Computer Science. However, not many people would understand the acronym you'd use. So, it would be great if in general one mentions what SAT refers to rather than using the acronym. Mention the long form the first time and then use the short form later in your sentence.

Sorry if this came across as too affront, just a personal observation. I saw this a couple of ago with GRUs on another thread too.

[+] stefs|8 years ago|reply
yes, a NN isn't the best tool for the job. i guess it's just an experiment if one was able to do this with a reasonable rate of success.
[+] likelynew|8 years ago|reply
There are no easy way to solve SAT as the size of sudoku. Modern day SAT solvers are very large and contains years of experience, containing 10s of heuristics and complex structures. If we can simplify using neural network, I think it's a great step.
[+] sarabande|8 years ago|reply
I find it interesting that when the NN fails to complete the whole puzzle, it seems to fail spectacularly (github.com/Kyubyong/sudoku#results) -- that is, there aren't a lot of good-partial attempts (90% or above). Does anyone know why this might be the case? Does the first wrong placement of a number in a gap essentially ruin the rest of the guesses?
[+] suddensleep|8 years ago|reply
My intuition says yes; if a wrong number is placed in a square and then taken as ground truth for solving the rest of the puzzle, it certainly seems like the error would propagate.

As a concrete example, say you misplace a '2' somewhere within a given puzzle. Obviously, this cell is incorrect. But depending on the nature of what the NN has learned, it may believe the row (resp. column, 3x3 box constraint) already has the '2' in it, so tries to fill its correct spot with another number. Which of course then leads to the column and/or 3x3 box of that cell to learn an incorrect value, starting the process over again.

This same phenomenon can be seen in the game Kenken; depending on the strategies you use at any given point in the game, one mistake can propagate outward pretty quickly and spoil large sections of the puzzle.

[+] dahart|8 years ago|reply
I'm not certain, but based on the description by the author, this solver is sometimes doing probabilistic guessing. You're more likely to make a mistake early when you do this -- there are fewer chances to guess incorrectly near the end of the game as inferences grow stronger, and more chances in the beginning and middle to guess wrong which will then snowball into a more wrong answer. So statistically, it makes sense that failures tend to be large.

I thought it was interesting that the "hard" category seemed to have more wrong answers than the 2 harder categories. Maybe there just aren't enough samples, but in the data shown in the table, it seems like a big difference.

[+] bitwize|8 years ago|reply
Reminds me of Watson's Jeopardy! exhibition match. Watson was right far more often than the humans, but when it was wrong it was way off base.
[+] CJefferson|8 years ago|reply
One big problem is the network is never trained on incorrect Sudokus, so at that point it's effectively into random guessing.
[+] LanceH|8 years ago|reply
Once you get a few right, the solution spaces closes very fast. A valid sudoku puzzle has a unique solution, so if you had 90% correct, there would only be one thing to choose from for each of the other 9 spaces.
[+] Undertow_|8 years ago|reply
Probably an early-onset snowballing of what you said. When one wrong placement is made, it's pretty hard to recover for a full puzzle.
[+] benp84|8 years ago|reply
As a combinatorial optimization problem, integer programming is the best tool for solving Sudoku. Fast, accurate, and guarantees the optimality of the solution. I'm impressed at how well this neural network did anyway though!
[+] nathell|8 years ago|reply
> 1M games were generated using generate_sudoku.py for training. I've uploaded them on the Kaggle dataset storage.

No need to do that. It would suffice to just include the random seed in the script so that its results are reproducible.

[+] jononor|8 years ago|reply
One should do both. Scripts and their dependencies have bugs, potentially leading to another sequence of games. At the very least one should specify a couple of the games, so one can check against them.
[+] trextrex|8 years ago|reply
That looks interesting. Sudoku is a fairly useful constraint satisfaction problem/testbed to test neural networks architectures on, since the most general version of sudoku is NP-complete.

There was some interesting work a while ago that solved sudoku using spiking neural networks too [1] (Fig. 5) with some nice associated theory.

[1] http://journals.plos.org/ploscompbiol/article?id=10.1371/jou...

[+] bhaavan|8 years ago|reply
I thought Sudoku can be solved in constant time. It would be a O(9! * 9! ) or something, which is just a O(1), isn't it?
[+] zeckalpha|8 years ago|reply
Sure, but so can decision trees.
[+] xor0110|8 years ago|reply
Coming up next: CNNs applied to training CNNs
[+] dimatura|8 years ago|reply
That actually happened a long time ago.
[+] andreyk|8 years ago|reply
Using 10 convolutional layers seems questionable - the precise values and location of each number is essential for getting the right answer, so it seems like the model design is poorly suited to the task. The obvious approach to me would be to train a neural net to be a heuristic to order exploration in a constraint satisfaction solver, mirroring how AlphaGo combines monte carlo search trees with neural nets. Might net a nice speed up.
[+] tinyrick2|8 years ago|reply
A more interesting problem would be to train an n x n sudoku and test the model on an m x m sudoku where m < n, or even m > n. Does it make sense?

edit: formatting

[+] bluetwo|8 years ago|reply
Funny, I taught my AI engine to solve easy, medium, and hard NYT sudoku last week.

It needed only 150 tries to solve the hard puzzle.

[+] faragon|8 years ago|reply
I would like to know if can be proven that NN for problems involving permutations can be as efficient as a SAT solver, less, or more efficient (e.g. if using the training just as a acceleration avoid learning futile permutations works for saving training time, or if it is not noticeable)