The RL Process
sqeuence of state, action, reward, next state.
Reward hypothesis : maximization of the expected cumulative reward.
Markov Property
!apt install swig cmake
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
!sudo apt-get update
!sudo apt-get install -y python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay
gymnasium[box2d]: Contains the LunarLander-v2 environment 🌛
stable-baselines3[extra]: The deep reinforcement learning library.
huggingface_sb3: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.