State Representation Learning for model-based RL

우병주·2026년 4월 15일

0. info

1. Model-Based Reinforcement Learning (MBRL)

  • 강화학습은 일반적으로 Markov Decision Process (MDP)로 정의됨
    (S,A,P,R,γ)(\mathcal{S}, \mathcal{A}, P, R, \gamma), where
    S\mathcal{S}: state space
    A\mathcal{A}: action space
    P(ss,a)P(s'|s,a): transition probabability function (i.e., dynamics)
    R(s,a)R(s,a): reward function
    γ\gamma: discount factor
  • 강화학습의 목표는 누적 보상을 최대화하는 정책 π\pi를 찾는 것
    maxπE[t=0γtrt]\max_\pi \mathbb{E}\left[\sum_{t=0}^{\infty} \gamma^t r_t \right]
  • Model-Free RL: No explicit model of P(ss,a),directly learns π(as) (or Q(s,a),V(s))\text{No explicit model of } P(s'|s,a), \text{directly learns } \pi(a|s)\text{ }(\text{or } Q(s,a), V(s))

    Model-free reinforcement learning methods learn a policy or value function directly from experience without explicitly learning the environment’s transition dynamics P(ss,a)P(s'|s,a) or reward model.

  • Model-based RL: Uses a model P^(ss,a) (and/or R^(s,a))\text{Uses a model } \hat{P}(s'|s,a) \text{ }(\text{and/or } \hat{R}(s,a))

    Model-based reinforcement learning methods explicitly learn or use a model of the environment dynamics (and possibly rewards), i.e., P(ss,a)P(s'|s,a) and/or R(s,a)R(s,a), to improve decision making.

  • model-free RL은 learning에, model-based RL은 planning에 중점을 두고 있음
  • world model은 MBRL 안에서 자주 쓰이는 구성요소.
    world modelMBRL\text{world model} \subset \text{MBRL}
    MBRL=world model+planning/policy learning\text{MBRL} = \text{world model} + \text{planning/policy learning}
  • MBRL은 world를 예측하는 model과, 그 모델을 사용해서 policy를 개선하는 과정까지 포함
  • World Model:

    A world model is a learned model that predicts future states (and optionally rewards) from current states and actions.

2. TBA

0개의 댓글