
Basic of RL MP, MRP, MDP ..

value, action-value function and example..

bellman equation for convert matrix, optimality

About policy evaluation..

Policy Evaluation Example

Find optimal policy using policy improvement.

policy iteration - policy evaluation 과 policy improvement 의 반복, optimal value function 과 policy 를 찾는 과정.
deep dive on value iteration