Reinforcement-Learning

1.정책 향상 정리

post-thumbnail

2.DP 방법 배팅액 정책 찾기

post-thumbnail

3.몬테카를로 방법 운전 정책 찾기

post-thumbnail

4.n단계 시간차 학습 (부트스트랩)

post-thumbnail

5.강화학습 On-policy Off-policy

post-thumbnail

6.강화학습 모델 개념, 환경이 변화하는 미로 예제

post-thumbnail

7.강화학습 근사적 해법

post-thumbnail

8.오목 강화학습

post-thumbnail

9.강화학습 interest, emphasis

post-thumbnail

10.강화학습 평균 보상

post-thumbnail

11.적격 흔적, TD(λ)

post-thumbnail

12.Actor-Critic

post-thumbnail

13.Cart Pole Actor-Critic

post-thumbnail

14.GAE

post-thumbnail

15.TRPO

post-thumbnail

16.REINFORCE, TRPO 비교

post-thumbnail

17.PPO

post-thumbnail

18.Mujoco Hopper PPO

post-thumbnail

19.Atari DQN

post-thumbnail

20.SAC

post-thumbnail

21.DDPG

post-thumbnail

22.[Imitation Learning] Overview of Imitation Learning, Distribution Shift

post-thumbnail

23.[Imitation Learning] Algorithms for Inverse Reinforcement Learning

post-thumbnail

24.[Imitation Learning] GAIL

post-thumbnail

25.Overview of Meta-RL

post-thumbnail

26.[Offline RL] MOPO

post-thumbnail

27.[Offline RL] RARL, RAMBO-RL

post-thumbnail

28.RLHF

post-thumbnail