Est. time to complete: 1 hour 30 mins

https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lJME4yTmlZVGN3TURGa1lqSTBZemt5WWpWbFlqaGxOVEF3WXpOaE5HWmhNeUlzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklsRjBaRGt4TVRWNGJVVk9aVlJaYm5BMWIxUkhJbjA9

Parts of this tutorial have been adapted fromÂ **Reinforcement Learning: an Introduction**

## Tutorial 1 Terminology Recap Quiz

https://docs.google.com/forms/d/e/1FAIpQLSdfViYc_OEILgtFzMCRn40IpnYy1hCaDfbTOzdyHlvrpuh-sA/viewform

In the previous tutorial, we saw how reinforcement learning algorithms learn a policy. The algorithmâ€™s aim is to find the **optimal policy**. This is the policy that takes the actions that maximise the sum of future rewards received.

In this tutorial, we start by better defining the goal of learning the optimal policy. We then introduce the key concept (value functions) and equation (Bellman Equation) that allow us to build our first reinforcement learning algorithm in Tutorial 3!

**1. Return** $G_t$

In Tutorial 1 we discussed, informally, the objective of reinforcement learning algorithms. We said that the goal of a reinforcement learning algorithm is to **maximize** the **cumulative reward** it receives in the **long run**.

We define this as the return, denoted $G_t$.