Course: Masters of Data Science, Université de Lille and Centrale Lille
Lecturers: Debabrota Basu, Riad Akrour, Ayoub Ajarra
Topics: Problem setup, Definition of regret, Lower and upper bounds (Why we prove them and their significances), UCB-type and Thompson Sampling-type algorithms
Topics: Motivation from recommender systems, Problem setup (stochastic and adversarial), upper and lower bounds (why and what do they mean), UCB-type and TS-type algorithms
Topics: Problem setup, Dynamic programming, Bellman optimality
Topics: Value iteration, Policy iteration, Q-learning, Temporal difference-based learning
Topics: Functional approximation in MDPs, Projected Bellman operator, LSVI, LSPI, Fitted Q iterations
Topics: RL with deep function approximations, Deep Q-networks
Topics: Policy gradient, Actor critics, Entropy regularisation