State Representation Learning for model-based RL

우병주·2026년 4월 15일

0. info

다음 두 논문을 다룬다. 내가 원하는 순서로 다룬다
A Survey of State Representation Learning for Deep Reinforcement Learning (TMLR'25)
The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning (NeurIPS'24)
별개로, ToBo와 CroBo 라는 논문이 state representantation learning method를 다룬다.

1. Model-Based Reinforcement Learning (MBRL)

강화학습은 일반적으로 Markov Decision Process (MDP)로 정의됨
$(\mathcal{S}, \mathcal{A}, P, R, \gamma)$ , where
$\mathcal{S}$ : state space
$\mathcal{A}$ : action space
$P(s'|s,a)$ : transition probabability function (i.e., dynamics)
$R(s,a)$ : reward function
$\gamma$ : discount factor
강화학습의 목표는 누적 보상을 최대화하는 정책 $\pi$ 를 찾는 것
$\max_\pi \mathbb{E}\left[\sum_{t=0}^{\infty} \gamma^t r_t \right]$
Model-Free RL: $\text{No explicit model of } P(s'|s,a), \text{directly learns } \pi(a|s)\text{ }(\text{or } Q(s,a), V(s))$

Model-free reinforcement learning methods learn a policy or value function directly from experience without explicitly learning the environment’s transition dynamics $P(s'|s,a)$ or reward model.
Model-based RL: $\text{Uses a model } \hat{P}(s'|s,a) \text{ }(\text{and/or } \hat{R}(s,a))$

Model-based reinforcement learning methods explicitly learn or use a model of the environment dynamics (and possibly rewards), i.e., $P(s'|s,a)$ and/or $R(s,a)$ , to improve decision making.
model-free RL은 learning에, model-based RL은 planning에 중점을 두고 있음
world model은 MBRL 안에서 자주 쓰이는 구성요소.
$\text{world model} \subset \text{MBRL}$
$\text{MBRL} = \text{world model} + \text{planning/policy learning}$
MBRL은 world를 예측하는 model과, 그 모델을 사용해서 policy를 개선하는 과정까지 포함
World Model:

A world model is a learned model that predicts future states (and optionally rewards) from current states and actions.

2. TBA

우병주

이전 포스트

State Representation Learning for model-based RL

0. info

1. Model-Based Reinforcement Learning (MBRL)

2. TBA

perceptual straightening comparision

0개의 댓글