Deep RL Course

HO SEUNG YOON·2024년 3월 19일
0

Reinforcement Learning

목록 보기
1/9

Unit 1

Framework

  • The RL Process

    sqeuence of state, action, reward, next state.

  • Reward hypothesis : maximization of the expected cumulative reward.

  • Markov Property

    • RL process is called a MDP(Markov Decision Process)
    • only the current state to decide, independent of the past states and actions

Two main approaches for solving RL problems

  • The Policy π\pi : the agent's brain

Unit 1: Train your first Deep Reinforcement Learning Agent

!apt install swig cmake
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
!sudo apt-get update
!sudo apt-get install -y python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay
  • gymnasium[box2d]: Contains the LunarLander-v2 environment 🌛

  • stable-baselines3[extra]: The deep reinforcement learning library.

  • huggingface_sb3: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.

    • requirements에 들어있다.

My LunarLander-v2

0개의 댓글