deep-reinforcement-learning

1.InstructGPT, ChatGPT

post-thumbnail

2.Reinforcement Learning from Human Feedback (RLHF)

post-thumbnail

3.HuggingFace Deep RL Course - 1. Deep Reinforcement Learning

post-thumbnail

4.HuggingFace Deep RL Course - 2. Q-Learning

post-thumbnail

5.HuggingFace Deep RL Course - 3. Deep Q-Learning

post-thumbnail

6.HuggingFace Deep RL Course - 4. Policy Gradient

post-thumbnail

7.HuggingFace Deep RL Course - 5. Unity ML-Agent

post-thumbnail

8.HuggingFace Deep RL Course - 6. Advantage Actor-Critic (A2C)

post-thumbnail

9.HuggingFace Deep RL Course - 7. Multi-Agents and AI vs. AI

post-thumbnail

10.HuggingFace Deep RL Course - 8. Proximal Policy Optimization (PPO)

post-thumbnail

11.On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

post-thumbnail