강화학습 톺아보기
강화학습의 전제
하위 방법의 반복을 통해 최적 해 구하기
state value와 action value
optimal policy와 optimal value function을 찾아서
Asynchronous Dynamic Programming
무작위성을 통해 정답 추정하기
Monte Carlo에서의 visit
Monte Carlo의 수렴성 증명
Not wait, Use estimate
TD: Prediction
TD: Control
Between MC and TD(0)
Plan and Train based on table
Estimating Value Functions as Supervised Learning
The Objective for On-policy Prediction
Semi-Gradient & State Aggregation
Coarse coding & State Aggregation
Softmax
For Continuing Tasks
Softmax
Information Theory is the way to share idea. Each media has a diffent for consuming time and size of content. For example, sending by letter is takes