Decision means, the agent decides what action to take in each state.Every action($$a_t$$) leads to next state($$s_t$$).The first important properties
In this chapter, we will focus more on expected return.State Value Function is a function for the return expected from now on. This function evaluate
Bellman equation is about method of representing state value function and action value function. By applying Bellman equtaion, we can express state v
As we have learned, the optimal policy is a function that maximizes the state value function. The state value function focuses on maximizing the rewar
We have learned how to calculate maximum value of state value function by optimal policy. If the $$Q^$$ is given, what we need to do is just find the
Recall the Bellman equation :
Temporal difference has a problem.
Relationship between on-policy & off-policy
We use off-policy in the Q-learning. We apply greedy action for target policy, and $$\\epsilon$$-greedy action for behavior policy.Since we use greedy
Let’s compare two methods with following example:
recall equation for $$Gt$$(expected reward)$$\\begin{aligned}G_t &= R_t + \\gamma R{t+1} + \\gamma^2 R{t+2}…\\&= R_t + \\gamma R{t+1} + \\gamma^2 G\_{
Let’s review Q-learning in the previous lecture.The action value function has target policy and transition pdf.