Recognizing Chess Game State from an Image

Image Processing and Deep Learning -> Magic

Featured on Hashnode
Recognizing Chess Game State from an Image

Introduction

In Summer 2022, I took the Image Processing and Deep Learning courses and I needed a problem to solve for the course projects. This problem caught my interest.

For a given photo that contains chessboard, with pieces on it, digitizing game state is the goal. This requires localizing the board, recognizing the pieces and assigning them to squares.

There are some projects about this subject (1) (2) (3) but they seem unnecessarily complex and not quite generalizable. I think my approach is much simpler, yet effective.

Method

For ease of reading, let's divide into three main headings.

  • Localizing chessboard
  • Detecting chess pieces
  • Assigning pieces to squares

Localizing chessboard

For this part, I solely used Image Processing techniques. Using corner detectors may seem tempting but remember, there are pieces on the board which blocks corners as well as most of the edges. So, we need more robust approach. Here is the pipeline I designed:

pipeline.png

Read the image and then convert to grayscale. After that, apply bilateral filter to reduce noise while preserving edges. Then apply canny edge detection to find edges and then apply hough line transform to find lines from them.

1.png

Now, there is a fact I should explain. Visualize a chessboard, it have two sets of lines. Lines within sets are parallel and perpendicular to ones on other set in real life but this doesn't apply to images. Clusters doesn't have to be perpendicular and lines within clusters can be parallel or intersect at a point called vanishing point.

pers_wsource.png

So, we need to find all parallel line clusters and intersecting line clusters, then put them in a bag, finally we will select the correct pair. For those goals, find all intersections between lines.

Finding parallel line clusters is easy, by looking nonintersecting lines. For finding intersecting line clusters, apply DBSCAN on all intersection points and resulting DBSCAN clusters will lead us to the points that crowded around potential vanishing points. From those clusters, we can form line clusters using lines that form intersections.

Figure_1.png

After finding all of those clusters, we need to determine which pair of clusters is the pair we're looking for, which creates the actual chessboard. Calculate the score for each pair considering angle difference between mean line of two clusters and cardinality of both clusters. After that, highest scoring pair is chosen as best two cluster, pair of clusters that forms chessboard.

5_best_pair.png

Now, we have two line clusters forming chessboard. Since hough line transform isn't a perfect solution, most of the time, it will detect duplicate lines instead of a single line. Apply DBSCAN within lines of each cluster and continue with mean of each cluster to eliminate duplicate lines.

6_best_pair_no_duplicate.png

At this stage, clusters doesn't have duplicates but they may have high cardinality, nevertheless we only need 9 lines from each. So, we need to select 9 lines from both clusters. It is achieved by fitting a polynomial to every possible set of 9 lines and calculating MSE within set. Finally, select 9 line set with lowest MSE from both clusters.

7_chessboard_lines_found.png

At the final stage, we have 2 clusters forming chessboard each with 9 lines, as intented. With those lines, all intersection points within chessboard is found, thus did chessboard.

8_final_result.png

This pipeline works well with most of the images containing chessboard without any angle, color etc. constraint. Algorithms has more depth than I explained but I didn't want to bore you and main idea is given. If you need more detail, source code is always there.

Detecting chess pieces

Classical image processing techniques aren't sufficient enough for problems with that scale. This is the part where deep learning joins the party. Convolutional Neural Networks are very strong for visual tasks, I used YOLOv5 architecture which is very accurate and fast, thanks to its fully CNN structure.

For this part, we'll focus on a dataset called Dataset of Rendered Chess Game State Images. This dataset contains chess game images rendered from Magnus Carlsen's games. An example test image:

aex.jpeg

First of all, I wrote a script to convert old labels and directory structure to YOLO compatible format. Then, I started training YOLOv5m model with pretrained weights.

image5.png

After 50 epochs, ~5 hours w/RTX3060, I took the weights and load them onto the project using PyTorch. Now, inference was available for testing.

label.jpeg

As can be seen, it detects very well.

Assigning pieces to squares

Now, for the last step, we need to merge our gatherings. For visualizing what we have, consider the figure below.

label_wlines.jpeg

Apply this simple approach: Take coordinates of bottom center of each bounding box and assign them to the square that the coordinate lies in it (with some vertical error margin).

After those steps, we finally achieved our goal. Consider example input-output photos below.

0480_m.jpg

0551_m.jpg

Adapting other datasets

Cool, but does it work with other datasets? Well, if you train it again, yes.

Lets try with Roboflow Chess Pieces Dataset. I took weights we trained for previous dataset and applied transfer learning which works by frozing backbone layers and retraining again. Since trainable parameters decrease a lot, training is much faster.

image20.png

After 200 epochs, ~30 minutes w/RTX3060, I took the weights again and tested with this new dataset. Results are pretty good, again.

other_dataset_m.jpg

Speed

Running on whole test set of Dataset of Rendered Chess Game State Images: Project was able to process all 342 test images in 187 seconds, containing corner detection module, which means each test image took about 0.54 seconds to pass the whole pipeline.

What can be improved?

  • Method of selecting 9 lines is not satisfactory enough. Algorithm can produce incorrect outcomes and if previous steps are not able to reduce number of lines, it is quite slow. Need more accurate and faster solution.

  • A method is needed to adapt new chess sets without fine tuning. Training may be done with a bigger dataset or few-shot transfer learning with photos of initial chess position for new sets can be tried.

Ending

All the source code, pre-trained weights and test images available in my github repository.

You can contact me for questions or suggestions. Thank you for your time (: