
[번역] InstructGPT와 ChatGPT (OpenAI)

InstructGPT, ChatGPT의 근간을 이룬 Reinforcement Learning from Human Feedback

Deep Reinforcement Learning - 1강 요약정리

Deep Reinforcement Learning - 2강 Q-learning

Deep Reinforcement Learning - 3강 Deep Q-Learning

Deep Reinforcement Learning - 4강 Policy Gradient

Deep Reinforcement Learning - 5강 Unity ML-Agent

Deep Reinforcement Learning - 6강 Actor-Critic

Deep Reinforcement Learning - 7강 Multi-Agent Reinforcement Learning (MARL)

Deep Reinforcement Learning - 8강 Proximal Policy Optimization (PPO)

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting, NeurIPS 2022