Est. time to complete: 2 hours

https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lKbE1XRTFaV0ZoWldZMVlUQTBNVE15WVRVMk1XTTJNamRrWkRRMk5ESmlOaUlzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklsRjBaRGt4TVRWNGJVVk9aVlJaYm5BMWIxUkhJbjA9

This tutorial wasn’t supposed to exist. After an exercise where you build an MCTS algorithm at the end of tutorial 8, you were supposed to move on to Convolutional Neural Networks (CNNs). CNNs are the fundamental technology behind computer vision, which allows machine learning algorithms to interpret images.

However, the number of difficulties I had in implementing MCTS myself led me to believe that it was necessary to guide you through building the algorithm step-by-step, so you can build it yourself.

The words Richard Feynmann famously left written on his blackboard when he died were:

“What I cannot create, I do not understand.”

MCTS is the focus of this week and a core component of AlphaGo. So it deserves the time and effort to develop a proper understanding of it. We’ll briefly cover Convolutional Neural Networks at the start of next week, as we start to look at AlphaGo.

This tutorial will be very practically-focused and have a different flow than the usual theory-in-notion, exercises-in-replit. We’ll take you through building an MCTS algorithm for Tic-Tac-Toe step-by-step in one Repl, with this Notion page giving explanations and details. We’ll really be going into the nitty-gritty. There are tests for each component we build, so we can check if each is correct before moving on.

If at any point you’re stuck on one section - hit ‘submit’ in the top right and take a look at that part of the solution code.

Recap of MCTS

1. Two-Player MCTS

In multiplayer, full-information games, we can modify the MCTS algorithm to be much more effective. We do this by focusing the search on the opponent’s moves, as well as our own. Instead of viewing the opponent’s moves as part of the state transition, we can model the opponent as having agency. To reflect this, we guide our search (with the tree policy) on parts of the tree where the opponent has taken actions that perform better.

This idea is fleshed out in the section corresponding to the selection step. However, selecting opponent actions that correspond to improving their chances of winning converges to the minimax algorithm in the limit of infinite simulations. So you can think of MCTS with a limited number of simulations as approximate minimax.