Course: Masters of Data Science, Université de Lille and Centrale Lille
Lecturers: Debabrota Basu, Riad Akrour, Ayoub Ajarra
Topics: Problem setup, Definition of regret, Lower and upper bounds (Why we prove them and their significances), UCB-type and Thompson Sampling-type algorithms
Topics: Motivation from recommender systems, Problem setup (stochastic and adversarial), upper and lower bounds (why and what do they mean), UCB-type and TS-type algorithms
Topics: Problem setup, Dynamic programming, Bellman optimality
Topics: Value iteration, Policy iteration, Q-learning, Temporal difference-based learning
Topics: Functional approximation in MDPs, Projected Bellman operator, LSVI, LSPI, Fitted Q iterations
Topics: RL with deep function approximations, Deep Q-networks
Topics: Policy gradient, Actor critics, Entropy regularisation
Bandit Algorithms. Tor Lattimore and Csaba Szepesvari (2019).
Reinforcement Learning. Richard Sutton and Andrew Barto (2018 edition).
Markov Decision Processes. Martin Puterman (1994).
Lecture slides of Emilie Kaufmann for the previous edition of this course.
Materials from the Reinforcement Learning Summer School (RLSS), 2019.