Duke University | AIPI 590

Introduction to Modern Reinforcement Learning

Course Description

This course provides a comprehensive, hands-on introduction to reinforcement learning (RL), bridging foundational theory with state-of-the-art methods used in modern AI systems. Students will learn how agents learn from interaction through the lens of Markov Decision Processes, value functions, and policy optimization. Beginning with classical tabular methods, the course progresses through deep RL architectures (DQN, PPO), human-in-the-loop learning (RLHF), and advanced topics.

Weekly labs emphasize implementation and experimentation and four open-ended challenges integrate concepts like safety, generalization, and alignment. By the end of the semester, students will be equipped to design, train, and critically evaluate RL agents.

Quick Links

GitHub Repository

Course materials and code

Textbook

Reinforcement Learning: An Introduction
Sutton and Barto (free online)

Discord

A place for discussion

Canvas

For Duke students

Schedule

Week	Date	Topic	Lecture	Lab	Challenges (due)
1	Jan 13	From AlphaZero to RLHF	Lecture 1	Lab 1	—
2	Jan 20	Reinforcement Learning Foundations: Agent-environment loop	Lecture 2	Lab 2	—
3	Jan 27	Reward Design	—	Lab 3	Challenge 0
4	Feb 3	Deep Reinforcement Learning: Value-based Agents	Lecture 3	Lab 4	—
5	Feb 10	Deep Reinforcement Learning: Policy Gradients and PPO	Lecture 4	Lab 5	—
6	Feb 17	Safety, Generalization, and Exploration	Lecture 5	—	Challenge 1
7	Feb 24	Human in the Loop RL	Lecture 6	Lab 6	—
8	Mar 3	RLHF Pipeline	Lecture 7	Lab 7	—
9	Mar 17	Offline RL	Lecture 8	—	Challenge 2
10	Mar 24	Model-based RL and World Models	Lecture 9	Lab 8	—
11	Mar 31	Hierarchical RL	Lecture 10	—	Challenge 3
12	Apr 7	Inverse RL and Reward Inference	Lecture 11	Lab 9	—
13	Apr 14	Reinforcement Learning Engineering	Lecture 12	Lab 10	—
14	Apr 21	Frontiers in Aligned RL	Lecture 13	—	Challenge 4

Schedule

Course Content

From AlphaZero to RLHF

Reinforcement Learning Foundations: Agent-environment loop

Reward Design

Deep Reinforcement Learning: Value-based Agents

Deep Reinforcement Learning: Policy Gradients and PPO

Safety, Generalization, and Exploration

Human in the Loop RL

RLHF Pipeline

Offline RL

Model-based RL and World Models

Hierarchical RL

Inverse RL and Reward Inference

Reinforcement Learning Engineering

Frontiers in Aligned RL