NDAK21003U Online and Reinforcement Learning (OReL)

Volume 2021/2022

MSc Programme in Computer Science


In the classical machine learning data are collected and analysed offline and it is assumed that new data come from the same distribution as the data that the algorithm was trained on. If not, all the theoretical guarantees become void and the empirical performance may deteriorate dramatically. But what if we want to design an algorithm for playing chess? The opponent is not going to sample the moves from a fixed distribution.


Online and reinforcement learning break out of the static realm and move into the realm of perpetual cycle of getting new information, analysing it, and executing actions based on the updated estimation of reality. We consider agents (computer programs, robots, living beings) learning based on interactions with (real or simulated) environments. Examples include problems like repeated investment in the stock market, spam filtering, online advertising, online routing, medical treatments, games, and robotics. It allows to model a much richer range of problems, including problems with limited feedback, problems with delayed feedback, and even adversarial problems, where the environment deliberately acts against the algorithm (as, for example, in chess or spam filtering). At the same time it stimulates the development of fascinating mathematical tools for developing and analyzing algorithms for these problems.

In the course we will cover:

  • The notion of regret: the evaluation measure, which replaces generalization error in offline learning and makes it possible to define and analyse learning in adversarial environments
  • Various forms of feedback, including full-information and limited [bandit] feedback

We will introduce the following basic online learning settings, algorithms, and their analysis:

  • Follow the Leader algorithm
  • Prediction with expert advice: the Hedge / Exponential Weights algorithm
  • Stochastic and adversarial multiarmed bandits: UCB1 and EXP3 algorithm
  • Contextual bandits: EXP4 algorithm

And the following basic reinforcement learning settings, algorithms, and their analysis:

  • Markov Decision Processes (MDPs)
  • Monte Carlo Methods for reinforcement learning
  • Dynamic programming for reinforcement learning
  • Temporal Difference Learning (e.g., Q-Learning)
  • Reinforcement learning using function approximators (e.g., Deep Q-Learning)

We will also cover a few advanced topics. The selection of advanced topics will depend on the lecturers and will be announced on Absalon.


The students will implement most of the algorithms studied at the course.

The course will bring the students up to a level sufficient for writing a master thesis in the domain of online and reinforcement learning.

WARNING: If you have not taken DIKU's Machine Learning master course, please, carefully check the "Recommended Academic Qualifications" box below and the self-preparation assignment at https:/​/​sites.google.com/​diku.edu/​machine-learning-courses/orel. Machine Learning courses given at other places do not necessarily prepare you well for this course. It is not advised to take the course if you do not meet the academic qualifications.

Learning Outcome


Knowledge of

  • Evaluation measures used in online and reinforcement learning

  • Basic online learning settings

  • Basic reinforcement learning settings

  • Basic algorithms for online and reinforcement learning problems

  • Basic tools for theoretical analysis of these algorithms


Skills in

  • Reading and understanding recent scientific literature in the field of online and reinforcement learning

  • Formalizing and solving online and reinforcement learning problems

  • Applying the knowledge obtained by reading scientific papers

  • Analyzing online and reinforcement learning algorithms and implementing them


Competences in

  • Understanding advanced methods, and applying the knowledge to practical problems

  • Planning and carrying out self-learning

See Absalon when the course is set up.



The course requires a strong mathematical background. It is suitable for computer science master students, as well as students from mathematics (statistics, actuarial math, math-economics, etc) and physics study programmes. Students from other study programmes can verify if they have sufficient math and programming skills by solving the self-preparation assignment (below) and if in doubt contact the course organiser.

It is assumed that the students have successfully passed the “Machine Learning” course offered by the Department of Computer Science (DIKU). In case you have not taken the “Machine Learning” course at DIKU, please, go through the self-preparation material and solve the self-preparation assignment provided https:/​​/​​sites.google.com/​​diku.edu/​​machine-learning-courses/​orel before the course starts. (For students with a strong mathematical background and some background in machine learning it should be possible to do the self-preparation within a couple of weeks.) It is strongly advised not to take the course if you do not meet the prerequisites.
Lectures, exercise classes, and weekly home assignments.
  • Category
  • Hours
  • Lectures
  • 28
  • Preparation
  • 18
  • Theory exercises
  • 70
  • Practical exercises
  • 70
  • Exam
  • 20
  • Total
  • 206
Continuous feedback during the course of the semester
7,5 ECTS
Type of assessment
Continuous assessment
6-8 weekly take-home assignments. The assignments must be solved individually.

The course is based on weekly home assignments, which are graded continuously over the course of the semester. The final grade is given as a weighted average of all the assignments, except the one with the lowest score.
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
Several internal examiners.

The re-exam consists of two parts:

1. The first part is handing in at least 6 of the course assignments no later than 2 weeks prior to the oral part of the re-exam.
2. The second part is a 30 minutes oral examination without preparation in the course curriculum.

The final grade will be given as an overall assessment of the two re-exam parts.

Criteria for exam assesment

See Learning Outcome.