Debabrota Basu

Lectures on Sequential Decision Making

Course: Masters of Data Science, Université de Lille and Centrale Lille

Lecturers: Debabrota Basu, Riad Akrour, Ayoub Ajarra

Lecture Plan

Lecture 1 (05/11): Multi-armed Bandits (Lecturer: DB)

Topics: Problem setup, Definition of regret, Lower and upper bounds (Why we prove them and their significances), UCB-type and Thompson Sampling-type algorithms

Lecture 2 (07/11): Contextual bandits (Lecturer: DB)

Topics: Motivation from recommender systems, Problem setup (stochastic and adversarial), upper and lower bounds (why and what do they mean), UCB-type and TS-type algorithms

Practical 1 (12/11) (Lecturer: AA and DB)

Assignment 1

Lecture 3 (14/11): Markov Decision Processes (Lecturer: DB)

Topics: Problem setup, Dynamic programming, Bellman optimality

Lecture 4 (19/11): Exact RL (Lecturer: DB)

Topics: Value iteration, Policy iteration, Q-learning, Temporal difference-based learning

Practical 2 (21/11) (Lecturer: AA)

Assignment 2

Lecture 5 (26/11): Approximate RL (Lecturer: DB)

Topics: Functional approximation in MDPs, Projected Bellman operator, LSVI, LSPI, Fitted Q iterations

Practical 3 (28/11) (Lecturer: AA)

Lecture 6: Deep RL (Lecturer: RA)

Topics: RL with deep function approximations, Deep Q-networks

Practical 4 (Lecturer: RA)

Assignment 3

Lecture 7: Advanced Deep RL Techniques (Lecturer: RA)

Topics: Policy gradient, Actor critics, Entropy regularisation

Practical 5 (Lecturer: RA)

References

Bandit Algorithms. Tor Lattimore and Csaba Szepesvari (2019).
Reinforcement Learning. Richard Sutton and Andrew Barto (2018 edition).
Markov Decision Processes. Martin Puterman (1994).
Lecture slides of Emilie Kaufmann for the previous edition of this course.
Materials from the Reinforcement Learning Summer School (RLSS), 2019.