[번역] InstructGPT와 ChatGPT (OpenAI)
InstructGPT, ChatGPT의 근간을 이룬 Reinforcement Learning from Human Feedback
Deep Reinforcement Learning - 1강 요약정리
Deep Reinforcement Learning - 2강 Q-learning
Deep Reinforcement Learning - 3강 Deep Q-Learning
Deep Reinforcement Learning - 4강 Policy Gradient
Deep Reinforcement Learning - 5강 Unity ML-Agent
Deep Reinforcement Learning - 6강 Actor-Critic
Deep Reinforcement Learning - 7강 Multi-Agent Reinforcement Learning (MARL)
Deep Reinforcement Learning - 8강 Proximal Policy Optimization (PPO)
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting, NeurIPS 2022