Solving Sudoku Puzzles with OpenCV

For my next project I wanted to build something that I have been planning to do for a long time.

I love solving sudoku and I was really good at it. Yup, I was good. That was three years ago. By the time I got good at solving sudoku, it started to get boring for me and I decided to write my first sudoku solver – goSudoku.

CLI Solution

goSudoku is written in Go. The goal was to make a fast sudoku solver by improving on the basic back-tracking algorithm. (More on that below)

The grid is represented as a string of numbers between 0 and 9. Zero being used to represent blank spaces. The string representation of the grid was generated by going over the grid from left-to-right, row-by-row. This string representation of the grid was the input to the program.

For example, the grid on the left gets represented as – 206000190349800600800906543100000070090030000760010005020500009005042700683097452

If we give this as an input to goSudoku will give us the following result.

$ ./goSudoku -i 206000190349800600800906543100000070090030000760010005020500009005042700683097452
Time taken :: 752.792µs
Solution :: 256473198349851627871926543132685974598734261764219835427568319915342786683197452

The solution by the program is also the right solution. 😀

You can find the code here @jerrymannel/go_sudoku.

Here are the things that I did to improve on the basic back-tracking algorithm.

The solution doesn’t use the back-tracking algorithm right away. It first tries to figure out as much as it can through multiple passes of the grid.
If the grid stayed the same between the previous and current, then it switches to backtracking.
It also identifies a set of candidate numbers of each cell. While backtracking it only picks from that list of candidate numbers for each cell and not every number between 1 and 9.

I wrote the code in 2021, so I think all of this is in there somewhere. 🙂

An automated solution

Even though goSudoku can find a solution fast, it lacked the ability to “read” the grid and enter the solution. It was not fully automated. This still required me to do the following for each grid,

Create the input string
Run it against goSudoku
Enter in the numbers back again

Even though this helped me improve my overall timings, the manual part in the beginning and the end made it feel like it was incomplete.

I had to find a UI based solution that could do all of this on its own. The algorithm looks simple, at least on paper it does.

On the screen find where the top left corner of the grid is.
Scan the values
Find solution using goSudoku
Input the values back in
Start a new game
Repeat.

Finding the grid

My first approach was to “show” the program where the grid started. Using a library called pynput, we can find the (x,y) coordinates on the screen where you clicked.

from pynput.mouse import Listener

def on_click(x, y, button, pressed):
  if pressed:
    print('Mouse clicked at ({0}, {1}) with {2}'.format(x, y, button))

with Listener(on_click=on_click) as listener:
  listener.join()

This gives the output as,

Mouse clicked at (883.96875, 445.6328125) with Button.left

Once we have the starting location, the rest of the values can be calculated.

This is a good segue to discuss about the dimensions that we have to deal with on the screen.

Dimensions

On a large screen, from the top-left of the grid the image below shows the dimensions. The grid and cells stay the same size for most window sizes. But if you resize the window and make it small beyond a point, then the grid shrinks and these dimensions won’t work.

Grid is 500×500 px. Cells are 50×50 px

The New Game button is conveniently big. So from the bottom-right corner of the grid we can just calculate a point which would be comfortably within the boundaries of that button.

But even now the program is not truly autonomous. I still have to “show” the program where the grid starts. Wouldn’t it be better if it could figure out all by itself. If I could make the program do that, then I can run it and it will go figure it out all by itself. The question is – How?

One way to approach this is if we could find where the text Difficulty: is on the screen. Once we know that then the top-left corner of the grid is a few pixels down.

OpenCV has the ability to match smaller images within a large image. I tried using that and with a bit of trial and error I was able to get consistent results. It starts with creating a template image which we will use to search within a larger image. The OpenCV method we used is matchTemplate.

result = cv2.matchTemplate(<imageToSeachOn>, <imageToMatch>, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

ℹ️ Please refer to the link above to understand what min_val, max_val, min_loc, and max_loc values are.

We create a reference Difficulty: image to search

When the program starts, we take a screenshot, convert the screenshot into an image which only has two colours black and white. This improves our chances of finding a match.

result = cv2.matchTemplate(imageOfScreen, imageOfDifficulty, cv2.TM_CCOEFF_NORMED)
locationOfDifficulty = cv2.minMaxLoc(result)[3] // max_loc

This was a milestone as I no longer had to “show” my program where to start. It could “look” and “find” where the grid was.

Scanning the grid

The approach to scanning the grid and identifying the number in each cell also uses the OpenCV matchTemplate method that I used above for finding the grid.

Before we scan the grid we have to do two more things.

Cell midpoints: We have to calculate the coordinates of each of the cells. What we are interested in is the midpoint of the cell. Knowing the mid-point of each cell helps us to navigate to that cell either to scan it or to enter a value.
Number reference images: We also need a set of reference images of numbers from 1 to 9. These reference images will be used to match the value of a cell.

Cell midpoints

Since the dimensions of the grid and the cells are fixed, it’s easy to compute the approximate mid-point of each cell. We just have to factor in the extra 2-4 pixels of the cell-borders into our calculation.

As soon as we know the top-left coordinates of the grid, we are going to calculate the midpoints of each cell. This never changes while running the program and it is much more efficient this way.

cellStartOffset = 30
offset = 50
startingPosition = (topLeft[0] + cellStartOffset, topLeft[1] + cellStartOffset)
offsetMatrix = {
	"x": [],
	"y": []
}
for i in range(0, 9):
  calculatedOffset = (offset * i) + (i*5)
  offsetMatrix["x"].append(startingPosition[0] + calculatedOffset)
  offsetMatrix["y"].append(startingPosition[1] + calculatedOffset)

Here’s how sample values of x and y coordinates looks like,

X -> [73, 128, 183, 238, 293, 348, 403, 458, 513]
Y -> [297, 352, 407, 462, 517, 572, 627, 682, 737]

Reference images

We need reference images of how numbers from 1 to 9 looks like in the grid. The reference images are not generated as part of the main program. I wrote a separate program that captured these and I manually labelled them. Here are the set of reference images that i generated.

Scanning the grid

With the reference images and the midpoints we can now parse the grid with the following logic,

At each mid-point we look at a square region of the cell.
Compare the cell image against each of the reference image of the numbers

numbers = [//..list of reference number images...//]
result = cv2.matchTemplate(cell, numbers[i], cv2.TM_CCOEFF_NORMED)
max_val = cv2.minMaxLoc(result)[1]

From the match result, we are only interested in the max_value as it gives a number that denotes the probability/confidence that the cell matched a reference number. The output looks similar to what’s given below. We just have to find the largest value to find the matching number.

1 0.07049579918384552
2 0.2200271040201187   ->  largest value
3 0.12006765604019165
4 0.09173955768346786
5 0.1461307406425476
6 0.17742568254470825
7 0.07899981737136841
8 0.13580825924873352
9 0.12301860749721527

If none of the reference number images were a match, then it is treated as 0. We also store the coordinates of the cells which are blank. Once we have the solution, we can iterate though this list to input the numbers.

Bringing it all together

We now have everything to make our program fully autonomous. The scanned output of the grid is in a format that goSudoku expects. We can call that from within the program,

process = Popen(["./goSudoku", "-i", values], stdout=PIPE)
// 256473198349851627871926543132685974598734261764219835427568319915342786683197452

Once we have the solution we compare this with the unsolved string, find the places where a 0 is present, do a look up to find the coordinates of the blank cell and input the number on the screen.

Done! 🎉

Check out the video below to see it in action!

CLI Solution#

An automated solution#

Finding the grid#

Dimensions#

Scanning the grid#

Cell midpoints#

Reference images#

Scanning the grid#

Bringing it all together#

Links#

CLI Solution

An automated solution

Finding the grid

Dimensions

Scanning the grid

Cell midpoints

Reference images

Scanning the grid

Bringing it all together

Links