
강화학습 톺아보기

강화학습의 전제

하위 방법의 반복을 통해 최적 해 구하기

state value와 action value

optimal policy와 optimal value function을 찾아서

Asynchronous Dynamic Programming

무작위성을 통해 정답 추정하기

Monte Carlo에서의 visit

Monte Carlo의 수렴성 증명

Not wait, Use estimate

TD: Prediction

TD: Control

Between MC and TD(0)

Plan and Train based on table

Estimating Value Functions as Supervised Learning

The Objective for On-policy Prediction

Semi-Gradient & State Aggregation

Coarse coding & State Aggregation

Softmax

For Continuing Tasks

Softmax

Information Theory is the way to share idea. Each media has a diffent for consuming time and size of content. For example, sending by letter is takes