Welcome to MSDN Blogs Sign in | Join | Help

Microsoft Excel

The team blog for Microsoft Excel and Excel Services.
Building a Basic, Understandable Sudoku Solver Using Excel Iterative Calculation - Part 2/2

Today's author, Charlie Ellis, continues discussing the spreadsheet he built to solve Sudoku puzzles.

In my previous post, I walked through a number of formulas for setting up the valid values and solution board. In this post we'll cover using iteration and other formula tricks to help solve the puzzle.

Using valid values to drive solutions

Looking at cells P22:R24 below, we can clearly see that the only solution here is 7:

image

Let's reflect this in the solution board.

We're going to modify the "else" argument of the IF function here to make it check to see if the val_bigcell_from_sol has only a single answer, and if so, return that. First we need to make val_bigcell_from_sol:

=INDEX(val_board, ROW(Main!A1)*3-2, COLUMN(Main!A1)*3-2):INDEX(val_board, ROW(Main!A1)*3, COLUMN(Main!A1)*3)

Making sure to enter this from the first cell in the solution board (cell D16).  Next we use this to make the else clause:

=IF(in_cell_from_sol,in_cell_from_sol,IF(COUNT(val_bigcell_from_sol)=1, SUM(val_bigcell_from_sol), ""))

Note the trick here to get a value from a 3x3 big cell; we can't just get its value, but since we know there's only once cell with a value, we can just use SUM to get the value of the one cell within the big cell that has a value.

And then we enter it and. whoops, we get a big scary warning:

image

Click Cancel, and then go to the tools options and make sure that "Enable iterative calculation" is checked:

image

Now, when you go back to the spreadsheet, it's worked, and we've completely solved this (really easy) Sudoku:

image

To see what it's doing, you can dial down the maximum iterations to 1 and add a reset to both the solution board and the valid values board. The way I like to do this is with a cell that tracks state, and either has a string that either says "reset" or "solve". Adding this to a cell (I chose D21) and naming it state, you can modify the valid values board formula and the solution board formula to be:

=IF(state="reset",onetonine,IF(sol_cell_from_val<>"",IF(sol_cell_from_val=onetonine, onetonine,""), IF(solution_in_rcb, "",onetonine)))

And

=IF(in_cell_from_sol, in_cell_from_sol, IF(state="Reset", "", IF(COUNT(val_bigcell_from_sol)=1, SUM(val_bigcell_from_sol), "")))

Respectively.

Finding the only cell in a row that can take a given value

Let's take a slightly tougher puzzle now, and see how we can extend this very rudimentary solver.

image

When you try to solve this puzzle with our existing solver, it gets some answers, but then it bogs down. To solve this puzzle, we need to make use of another rule of solving Sudoku puzzles which is that there must be at least one of each number in every row, column and big box. This rule, which is almost the first rule flipped around helps you get to a solution when there is only one possible place in a whole row, column or big box that a particular number can go.

Before we get to how to actually check for this, let's talk about the right place to put this logic. Because it is helping us to find a solution as opposed to eliminate a possibility, it should go in the solution board. And because it covers a case that is the non-reset-state case where a solution can't be found (and the empty string is returned), the right place is where that normally would be. Furthermore, we have to both find a solution and return that solution, so it makes sense to collapse both of those into one name instead of testing for a solution then doing the same work we did to find the solution in order to return it. That means we're going to change our formula to something like:

=IF(in_cell_from_sol, in_cell_from_sol, IF(state="Reset", "",IF(COUNT(val_bigcell_from_sol)=1, SUM(val_bigcell_from_sol), single_sol_in_rcb)))

Where single_sol_in_rcb is either a solution, if one exists, or the empty string.

To implement this, we'll need val_bigcell_from_sol, but also val_bigrow_from_sol, val_bigcol_from_sol, and val_bigbox_from_sol. These are all fairly straightforward extensions on the concepts from sol_row_from_val and val_bigrow_from_sol, so I won't go into detail here, but here's the formula for val_bigbox_from_sol as entered/viewed from cell D16:

=INDEX(val_board, INT((ROW(Main!A1)- 1) / 3) * 9 + 1, INT((COLUMN(Main!A1)- 1 ) / 3)*9+1):INDEX(val_board, INT((ROW(Main!A1)-1) / 3) * 9 + 9, INT((COLUMN(Main!A1)-1) / 3) * 9 + 9 )

For now, we're going to start out just checking the rows for cases where there's only one cell that can contain a given value. Thus we'll make single_sol_in_rcb just be:

=single_sol_in_row

And then eventually make it so that we can detect a single solution in any of rows, columns, or boxes. BTW, it's fine to create names that make use of these as-yet-undefined names, but if you put these into the sheet, Excel annoyingly resizes your columns to accommodate the #NAME errors that inevitably result.

Here's what this looks like on the valid value board:

image

Here the five in the big cell highlighted in red is the only five in the entire row, so the corresponding solution cell must be five. We can break this down into two pieces:

  1. Checking that a value exists within the valid values board big cell corresponding to the current solution cell
  2. Checking that there's only one of that value in the valid value board big row corresponding to the current solution cell

The most efficient way I've gotten to work, is to do these checks and comparisons by creating an array of the values in the current big cell and multiplying that array by which of those values only occur once in a given row. Here's the formula for this:

=MAX((COUNTIF(val_bigrow_from_sol,array1to9)=1)*(COUNTIF(val_bigcell_from_sol,array1to9)=1)*array1to9)

Where array1to9 is the array {1;2;3;4;5;6;7;8;9}, which is easiest to get using =ROW($A$1:$A$9). The first COUNTIF and equality operator check whether, for each number 1-9, there's only one of the number in the row. In the above example, this becomes {0;1;0;0;1;1;0;0;0} because each of 2, 5, and 6 occur only once in the highlighted row. The second COUNTIF and equality operator check which values are present in the current cell. In the case of the highlighted cell, this is {0;0;0;1;1;0;0;0;0}. Multiplying them together gets the intersection of those two conditions ({0;0;0;0;1;0;0;0;0}), and multiplying this by array1to9 gives you {0;0;0;0;5;0;0;0;0}, which can be reduced to just the desired value by taking the MAX.

Then, in single_sol_in_rcb, in order to not have to call single_sol_in_row more than once to both return an answer and turn the 0 condition into an empty string, you can use the CHOOSE function like so:

=CHOOSE(single_sol_in_row+1, "", 1, 2, 3, 4, 5, 6, 7, 8, 9)

This is enough to solve the second sample puzzle, but extending this to columns and boxes is actually really easy at this point. All you have to do is make a single_sol_in_col and single_sol_in_box, which are identical to single_sol_in_row except for the first named range. Then you have to adjust single_sol_in_rcb to include the solutions for column and box like so:

=CHOOSE(MAX(single_sol_in_row, single_sol_in_col, single_sol_in_box)+1, "", 1, 2, 3, 4, 5, 6, 7, 8, 9)

Well, that's it. The actual workbook itself is attached to my original post. I think it's neat in that there are only really three different formulas on the actual sheet and another 18 in the names. If you think about it as lines of code, there aren't a lot of places outside of obfuscated code contests where you get 21 lines of code to do as much complicated stuff. I think you'll have to be the judge of how clear the formulas are, but by using the commenting on names and further cleaning up naming conventions, I think this could be very understandable. Certainly there's a lot more I could walk through in a future post - primarily how to add in more complicated strategies that I've gotten to work in this framework such as pairs, triples, hidden pairs, hidden triples, and box-and-line, but also debugging strategies, pitfalls for performance or maintainability, etc. - but that's the basic idea. I'd love to hear thoughts on this approach, other approaches, suggestions for new strategies or additional tricks to make existing stuff simpler or faster. Hope you found it interesting.

Posted: Wednesday, October 01, 2008 11:25 PM by Joseph Chirilov

Comments

Phylyp said:

This is a very nice post - Excel is quite geared towards Sudoku due to its row-column nature (unlike chess where diagonals can get Excel rather tangled!)

# October 3, 2008 6:55 AM

Huan said:

Hi,excel is great..but i wonder how many spreadsheet i can create in one excel file..thx

# October 6, 2008 5:55 AM

Charlie Ellis said:

@ Phylyp - Good point.  Checking diagonals would be a pain since nearly all of the functions don't work on discontiguous ranges (which is the only way to do non-rectangular ranges, such as all the cells on a diagonal).  I think you'd either be reduced to checking against each cell on a diagonal separately or creating a copy of the entire board that eliminates things not on a diagonal and then doing tests based on that.  Both of these approaches has promise, though I think the latter is more workable.

For example, I think you could reasonably easily model the positions on a board in check by a queen at any spot on the board as an array of 1s and 0s.  Then you could use this array to see which pieces are in check by multiplying it by the array representing the board.

I think the bigger trick would be representing all of the eventualities and consequences of each move to any depth.  That's a problem that doesn't reduce itself to representation as a series of cell-based representations of various aspects of the board (e.g. what pieces are where, what's in check how many times, what is or can check the king, etc.) nearly as well as Sudoku.  Sudoku is much more like a state machine, whereas you need basically a stack of machines similar to what we built for Sudoku to represent chess.

# October 7, 2008 10:02 PM

Shahed Khan (MVP C#) said:

196 Microsoft Team blogs searched, 85 blogs have new articles in the past 7 days. 194 new articles found...

# October 28, 2008 4:49 AM
New Comments to this post are disabled
Page view tracker