Est time to complete: 45 mins

https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lKak5qZG1aak0wWlRKbFlUZzBNbVZtT0RKaE5tWXpZalk1TmpjeU5qVXlaaUlzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklsRjBaRGt4TVRWNGJVVk9aVlJaYm5BMWIxUkhJbjA9

Untitled

Admin: Replit

1. Reinforcement Learning for Large-Scale Problems

So far we've only tackled small-scale problems. The helicopter routing exercises had $100$ possible states. In Tic-Tac-Toe there are $3$ possible things that can fill each of the $9$ grid squares (”X”, "O" or “ “), so there is an upper bound of $3^9=19683$ possible states (this is an upper bound because some of these states are invalid). We call the set of all possible states in an MDP the State Space.

In recent years we've seen RL tackle complex problems with extraordinarily large state spaces. For example, Chess has an estimated state space size of $10^{46}$ and the game of Go $10^{170}$. This is a lot - it’s is estimated that there are $10^{80}$ atoms in the observable universe.

This also doesn't consider continuous state spaces, where a state value can take any number in some range$^*$. These have hypothetically infinite state space sizes (since there are infinitely many numbers that could be chosen between 11 and 22, for example).

This is also a major issue stopping RL from being deployed in the real world. Many real-world environments are most accurately represented by continuous variables - e.g. a robot’s location and orientation.

How is it the case that state spaces can grow to be so large?

$^*$Continuous example: if the helicopter example from the last tutorial used a continuous state space, the grid squares would disappear. The state could be a tuple of the distance from the start point in x and y, which could take any value between $0$ and $9$, instead of only taking the whole numbers. E.g. $0.76$, $0.786$, $4.8$ are all valid x or y distances.

The Curse of Dimensionality

To explain why state spaces grow so large, let's take the example of a normal $3\times3$ Tic-Tac-Toe, which has  $3^9=19,683$ possible states.

Suppose we want to make it a little harder, so make the grid $4\times4$. The number of grid squares has just less than doubled, from $9$ to $16$. However, the number of possible states has exploded from $19,683$ to $3^{16}=43,046,721$ possible states. In this case, doubling the number of grid squares multiplies the state space by $2187$.

The diagram below shows the magnitude of exponential growth as the exponent (small number on the right) grows. The right graph's $y$-axis is multiplied by $10^{21}$.