Deep RL Course

HO SEUNG YOON·2024년 3월 19일

huggingface reinforcement learning

Reinforcement Learning

목록 보기

1/9

Unit 1

Framework

The RL Process

sqeuence of state, action, reward, next state.
Reward hypothesis : maximization of the expected cumulative reward.
Markov Property
- RL process is called a MDP(Markov Decision Process)
- only the current state to decide, independent of the past states and actions

Two main approaches for solving RL problems

The Policy $\pi$ : the agent's brain

Unit 1: Train your first Deep Reinforcement Learning Agent

!apt install swig cmake
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
!sudo apt-get update
!sudo apt-get install -y python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

gymnasium[box2d]: Contains the LunarLander-v2 environment 🌛
stable-baselines3[extra]: The deep reinforcement learning library.
huggingface_sb3: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.
- requirements에 들어있다.

My LunarLander-v2

HO SEUNG YOON

윤냠

다음 포스트

Deep RL Course

Reinforcement Learning

Unit 1

Framework

Two main approaches for solving RL problems

Unit 1: Train your first Deep Reinforcement Learning Agent

Fundamentals of Reinforcement Learning - Week 1

0개의 댓글

관련 채용 정보