Lecture Plan

  • Lecture 1 (28/11): Multi-armed Bandits (Lecturer: DB)

    Topics: Problem setup, Definition of regret, Lower and upper bounds (Why we prove them and their significances), UCB-type and Thompson Sampling-type algorithms

  • Lecture 2 (30/11): Contextual bandits (Lecturer: DB)

    Topics: Motivation from recommender systems, Problem setup (stochastic and adversarial), upper and lower bounds (why and what do they mean), UCB-type and TS-type algorithms

  • Practical 1 (05/12) (Lecturer: HK and DB)

    Notebook: (Link)

  • Assignment 1
  • Lecture 3 (07/12): Markov Decision Processes (Lecturer: DB)

    Topics: Problem setup, Dynamic programming, Bellman optimality

  • Lecture 4 (19/12): Exact RL (Lecturer: DB)

    Topics: Value iteration, Policy iteration, Q-learning, Temporal difference-based learning

  • Practical 2 (21/12) (Lecturer: HK)

    Notebook: (Link)

  • Assignment 2

  • Lecture 5 (23/01): Approximate RL (Lecturer: DB)

    Topics: Functional approximation in MDPs, Projected Bellman operator, LSVI, LSPI, Fitted Q iterations

  • Practical 3 (25/01) (Lecturer: HK and DB)

    Notebook: Link

  • Lecture 6 (30/01): Deep RL (Lecturer: RA)

    Topics: RL with deep function approximations, Deep Q-networks

  • Practical 4 (01/02) (Lecturer: RA)

  • Assignment 3

  • Lecture 7 (06/02): Advanced Deep RL Techniques (Lecturer: RA)

    Topics: Policy gradient, Actor critics, Entropy regularisation

  • Practical 5 (08/02) (Lecturer: RA)

  • References