Fundamentals of Reinforcement Learning - Week 2

HO SEUNG YOON·2024년 4월 9일

Coursera KMOOC reinforcement learning

0

Reinforcement Learning

목록 보기

3/9

Markov Decision Processes

Lesson 1: Introduction to Markov Decision Processes

Markov Decision Processes

broccoli rabbit carrot tiger ; state

transition dynamics function p
Markov property
- future state and reward only depends on the current state and action
- present state is sufficient and remembering earlier states would not improve prediction

Examples of MDPs

return is random variable because
- the dynamics of MDP can be stochastic확률론적
to be well defined the sum of rewards must be finite

when interaction ends
- interaction breaks into chunks called episode
- each episode begins independently of how previous ended
- At termination the agent is reset to start state
- every episode has terminal state(final state)
we call these tasks episodic tasks

Lesson 2: Goal of Reinforcement Learning

Michael Littman: The Reward Hypothesis

even if we accept reward hypothesis still need to define right reward
- point the agent right direction

Lesson 3: Continuing Tasks

윤냠

이전 포스트

Fundamentals of Reinforcement Learning - Week 1

다음 포스트

RL Course by David Silver - Lecture 2: Markov Decision Process

0개의 댓글

관련 채용 정보