There is no 16-Clue Minimum Sudoku

The story is over. When you searched "Minumum Sudoku" in Google you could read again and again that "The 16-clue Minimum sudoku" question is still open, but the conjecture would be,  that there is no 16-clue Sudoku.

As of 2012-01-01 we know, what we before only strongly suggested. McGuire and his team proved it by full enumeration of all possible grids.

For more than 5 years Gordon Royle was collecting 17er puzzle with a special sudoku generator for 17-clue minimum Sudoku. He was something like the last instance for such type of Sudoku. His famous and free collection of 17-clue minimum Sudoku included end of 2010 almost 50.000 Sudoku. Many and I have used his database to test their own sudoku solvers.

A brief excursus on "How to generate 17- clue Minimum Sudoko?" and "Maximum Hardness"

With an unspecific Sudoku generator program, you have to wait long, very long, when you hope to create a 17 -Clue  Sudoku. It may even happen, that you find none after several days. You have to find a more specific approach to attack the problem:

Blocked candidates 1 in a 9*9 Sudoku by a clue in 1 in cell D5

Each clue or given you insert in a dedicated cell of a Sudoku matrix, is blocking a specific number of the same digit in other cells of the same Sudoku matrix (8 + 2*6 = 20) and 8 other digits in the same cell, as the other 8 digits can not any longer appear in the same cell. So we have in total 28 restrictions for candidates by one clue.

It's important to assure, that the generator is inserting the restrictions (clues) in a way, that you have least overlappings (!) with other clues or digits in the neighborhood of the same clue.

To illustrate the problem: when the first 3 rows are completely filled with digits, the rest of the Sudoku is in no means filled out and decided. On the other site when clues are widespread in a regular pattern over the whole grid it is no problem at all to create a Sudoku with a unique solution by 27 clues. 

A specific generator to create 17-clue Sudoku therefore is sorting out templates with a high degree of overlapping for single digits and numbers in an early stage of the calaculation. This is in order to save computing time. Another approach to create minimum Sudoku is to take existing one and modify those slightly and hope to find new ones.

How does hardness and the property "minimal number of clues" relate? To express it in simple words: 99% of all known 17-clue sudoku have a rating under SE 3.0 and the very few hard one do not exceed the level of  SE 9.0. This is not completely surprising as the above mentioned rule to have overlappings (!) is in contradiction to a heuristic for the creation of hard sudoku - the number of other clues each clue is seeing should be minimal.

My own solver likes 17ers - as he could easily solve them by logic. Unfortunately he is failling solving hard ones.

End of Excursus

Gordon's database (something like 47051 17-clue Sudoku) is checked against duplicates by bringing them to the Canonical form. They were found very fast initially, but then as people lost interest (since 2008) and the search became harder due to fewer new ones left to find, the rate of discovery dropped to daily, then weekly, then monthly. 2011-01-03 Gordon outlined i his blogg:

"A good Sudoku solver can work a lot faster than a puzzle a second, so there’s no problem in checking the existing 17s to see if they contain a 16. The problem is that we have no way of knowing that the database is complete – because of the way they were constructed they are (in some sense) “close” to each other, and so they form a sort of large clump of puzzles in “puzzle space”. It is conceivable, though I believe unlikely, that there is a completely separate clump of puzzles that might contain 16s…

Doing the numbers suggests that something clever will be needed to solve this; even projected computer advances won’t be enough to resolve it in my lifetime."

Gordon's idea was to search all possible 17-clue Sudoku with exact one solution and just to reduce each to every possible 16-clue Sudoku (81-17) and to see whether it would have had still exactly one solution.

Let me give you three figures at this stage

Upper bound of 17-Clue Sudoku with one or multiple solution  (5,47 * 10^9)*((81!/17!)/(81-17)!) = 7,02*10^26
Upper bound of 16-Clue Sudoku with multiple solution           (5,47 * 10^9)*((81!/16!)/(81-16)!) = 2,24*10^26
Upper bound of 12-Clue Sudoku with multiple solution            (5,47 * 10^9)*((81!/12!)/(81-12)!) = 4,71*10^23

This is an upper bound for the number of valid grids and solutions and takes into account that there are isomorph Sudoku puzzles. For each Sudoku there are 2*9!*68 = 1218998108160 isomorph Sudoku.So in total there are about 5,47 * 10^9 different Sudoku grid. When a pseudo generator generates (top down) a Sudoku puzzle from a valid grid, by eleminating clues by random, the result is always a valid puzzle with one or multiple solutions.

Why the hell should there only be 50.000 unique 17ers in a puzzle space of something like 7,02*10^26 possible non isomorph 17ers? O.k. still there are a lot of duplicates in this figure because 2 identical puzzles could be created from different grids. Gordon writes: "It is conceivable, though I believe unlikely, that there is a completely separate clump of puzzles that might contain 16s…".
Well, when you use a certain method to create 17ers - as desrcibed above. But why not thinking about a completly other method? I leave it as it is. Gordon's idea, as we know today, was a dead end.

McGuire could start his calculations, there was a reasonable progress made in 2010  by Hung-Hsuan Lin and I-Chen Wu. They published their article "Solving the Minimum Sudoku Poblem" in November 2010.

"It is known that solving the minimum Sudoku problem can be done by checking 5,472,730,538 essentially different Sudoku grids, ... the program Checker, written by McGuire, requires about 311 thousand years on one-core CPU to check these grids completely, according to our experimental analysis. ", wrote Lin and Wu  and continued: "This paper proposes a new algorithm, named a disjoint minimal unavoidable set (DMUS) algorithm, to help solve the minimum Sudoku problem. ... In our experiment, the performance was greatly improved by a factor of 128.67. Hence, the improved program by us requires about 2417.4 years only."

They started a repsective project on the BOINC platform in August 2010 to run all these 5,472,730,538 essentially different Sudoku grids. Currently. On 2012-01-21 the percentage of the checked grids to the total was 31.48%.
Last year when I read about  their approach first, I wondered how long it would take them to get that problem  - 16-Clue Sudoku - solved. My assumption was that it would take another 5 -10 years. I think my guess was realistic, when you take in consideration that this problem is not one progressing mankind.

However one should never underestimate peoples motivation. Mallory was attacking Mount Everest 3 times in the early 20th of the last century. He died during the 3rd attack, but got famous for his answer, why he would try to climb this mountain. His answer was: “Because it is there.”

George Mallroy (1886 - 1924)

Having spent two years testing the algorithm, McGuire and his team used about 7 million CPU hours at the Irish Centre for High-End Computing in Dublin, searching through possible grids with the hitting-set algorithm.

“The only realistic way to do it was the brute force approach,” explained Gordon Royle, mathematician from University of Western Australian in Perth. He had been working on the problem of counting 17 clue puzzles using different algorithms.


17er Sudoku sample

"I have only skimmed the paper so far", Royle congratulated and continued: "but the basic method is clear enough – to examine each of the possible completed grids in turn and exhaustively search the grid to demonstrate that no 16-clue subgrid is uniquely completable. One way of doing this that does not involve actually solving every possible 16-clue subpuzzle is to observe that certain subsets of the cells must necessarily be 'hit' by one of the 16 clues or they could simply be rearranged in the final solution. Finding so many subsets of this type that no collection of 16 cells can hit them all is then a (computer) proof that this particular grid contains no 16-clue uniquely completable subgrid."

Royle's comment sounds like disjoint minimal unavoidable set (DMUS) algorithm invented by Hung-Hsuan Lin and I-Chen Wu in 2010. Latest since their paper was published, it was only a question of time to decide about the 16er question and two teams seem to have started the race like Amundsen and Scott to the South pole 100 years ago.

I may amend that the energy consumed, could be sufficient to supply a medium sized town close to the Sahel zone with light and cooling for the same amount time. I think somehow people must feel like Mallroy to attack a target with such an intensity (and such an amount of engergy and money). The risk to loose their lives is  negligible however in our days.

Similar to Hales proof of the Kepler conjecture in the late 1990 this is a typical proof of the 21st century based on computers and full enumeration. To my understanding - we only know little more than, that there is no 16-Clue Sudoku. E.g. we do know nothing more about the distribution of invalid (no solution), one solution, one solution minimum and multiple solution 17-Clue Sudoku.

Just to mention that we still have no confirmation that the prove is correct. Such kind of computer proof may have faults and miscalculations just by computer failliure and it's hard and costly to reproduce 7 mio core computer hours.