Lecture Plan

  • Lecture 1 (05/11): Multi-armed Bandits (Lecturer: DB)

    Topics: Problem setup, Definition of regret, Lower and upper bounds (Why we prove them and their significances), UCB-type and Thompson Sampling-type algorithms

  • Lecture 2 (07/11): Contextual bandits (Lecturer: DB)

    Topics: Motivation from recommender systems, Problem setup (stochastic and adversarial), upper and lower bounds (why and what do they mean), UCB-type and TS-type algorithms

  • Practical 1 (12/11) (Lecturer: AA and DB)
  • Assignment 1
  • Lecture 3 (14/11): Markov Decision Processes (Lecturer: DB)

    Topics: Problem setup, Dynamic programming, Bellman optimality

  • Lecture 4 (19/11): Exact RL (Lecturer: DB)

    Topics: Value iteration, Policy iteration, Q-learning, Temporal difference-based learning

  • Practical 2 (21/11) (Lecturer: AA)
  • Assignment 2

  • Lecture 5 (26/11): Approximate RL (Lecturer: DB)

    Topics: Functional approximation in MDPs, Projected Bellman operator, LSVI, LSPI, Fitted Q iterations

  • Practical 3 (28/11) (Lecturer: AA)
  • Lecture 6: Deep RL (Lecturer: RA)

    Topics: RL with deep function approximations, Deep Q-networks

  • Practical 4 (Lecturer: RA)

  • Assignment 3

  • Lecture 7: Advanced Deep RL Techniques (Lecturer: RA)

    Topics: Policy gradient, Actor critics, Entropy regularisation

  • Practical 5 (Lecturer: RA)

  • References