Introduction to Reinforcement Learning
Spring 2022
This page presents lecture materials for CS 4789/5789: Introduction to Reinforcement Learning taught by Sarah Dean at Cornell University in spring 2022. For the most recent materials look here . This course was first taught by Wen Sun in spring 2021.
Schedule
| Date | no. | Lecture Title | Materials |
| 1/24 | 1 | Introduction to RL |
Lecture Notes
Slides, Live Notes, Video |
| 1/26 | 2 | MDPs and Bellman Equations |
Lecture Notes
Slides, Live Notes, Video |
| 1/31 | 3 | MDPs, Optimal Policies, and Value Iteration |
Lecture Notes
Slides, Live Notes, Video |
| 2/2 | 4 | Policy Iteration and Dynamic Programming |
Lecture Notes
Slides, Live Notes, Video |
| 2/7 | 5 | Continuous Control |
Lecture Notes
Slides, Live Notes, Video |
| 2/9 | 6 | Linear Quadratic Regulation |
Lecture Notes
Slides, Live Notes, Video |
| 2/14 | 7 | Nonlinear Control |
Lecture Notes
Slides, Live Notes, Video |
| 2/16 | 8 | Limitations in Control and Observation |
Lecture Notes
Slides, Live Notes, Video |
| 2/21 | 9 | Prediction and Estimation |
Lecture Notes
Slides, Live Notes, Video |
| 2/23 | 10 | Model-based RL |
Lecture Notes
Slides, Live Notes, Video |
| 2/28 | February Break | ||
| 3/2 | 11 | Approximate and Conservative Policy Iteration |
Lecture Notes
Slides, Live Notes, Video |
| 3/7 | 12 | Supervision via Bellman |
Lecture Notes
Slides, Live Notes, Video |
| 3/9 | 13 | Optimization Background |
Lecture Notes
Slides, Live Notes, Video |
| 3/14 | 14 | Policy Optimization: Random Search and Policy Gradient |
Lecture Notes Slides, Live Notes, Video |
| 3/16 | 15 | Policy Optimization: Trust Region and Natural PG |
Lecture Notes Slides, Live Notes, Video |
| 3/21 | 16 | Prelim Review | Slides, Video |
| 3/23 | 17 | Exploration: Multi-Armed Bandits |
Lecture Notes Slides, Live Notes, Video Code, Notebook |
| 3/28 | 18 | Upper Confidence Bound Algorithm |
Lecture Notes Slides, Live Notes, Video |
| 3/30 | 19 | Contextual Bandits |
Lecture Notes Slides, Live Notes, Video |
| 4/4 | Spring Break | ||
| 4/6 | Spring Break | ||
| 4/11 | 20 | Linear Contextual Bandits |
Lecture Notes Slides, Live Notes, Video Code, Notebook |
| 4/13 | 21 | Exploration in MDPs |
Lecture Notes Slides, Live Notes, Video |
| 4/18 | 22 | Imitation Learning with BC |
Lecture Notes Slides, Live Notes, Video |
| 4/21 | 23 | Interactive Imitation Learning |
Lecture Notes Slides, Live Notes, Video |
| 4/25 | 24 | Inverse RL |
Lecture Notes Slides, Live Notes, Video |
| 4/27 | 25 | Max Entropy IRL |
Lecture Notes Slides, Live Notes, Video |
| 5/2 | 26 | Specification and Societal Implications | Slides, Video |
| 5/4 | 27 | AlphaGo Case Study | Slides, Video |
| 5/9 | 28 | Review | Slides, Video |