SBRI19012U  Big Data Analytics and Machine Learning II – Biostatistics and Epidemiology in Translational Medicine

Volume 2019/2020

BRIDGE - Translational Excellence Program



Big data are relevant – not only in genetic studies – but also in register-based studies. Examples include drug repurposing studies where it is investigated whether drugs that are already in clinical use may have utility in other disorders than those for which it was developed. In this course, statistical and epidemiological methods for register-based studies are discussed with special emphasis on event history analysis and causal inference.

We will first give a brief introduction to Danish disease registers on which such studies can be based, most importantly the Medicinal Product Register, the National Patient Register, and the Cause of Death Register.

Next, examples will be given of studies of disease trajectories in a life-long perspective, how patterns of occurrence are identified and how these patterns are affected by the chance of accessing/observing each individual.

In event history analysis, the concepts of states and transitions between states are crucial and statistical models for transition intensities and transition probabilities will be introduced. For such models it is important to be able to make inference in the presence of incomplete observation. This includes truncation, where subjects enter the study at different phases of their life course, and censoring where observation may have terminated before the events of interest (e.g., developing some disease) have occurred.

Register-based studies are, by nature, observational and may as such be impacted by confounding, i.e., both drug use and disease outcome may be affected by other factors. In order to make valid inferences in such studies adjustment for confounding is crucial. The area of causal inference deals with drawing valid (i.e., 'causal') conclusions based on observational data. We will discuss under which conditions such causal conclusions are warranted and methods (including inverse probability of treatment weighting and the so-called 'g-formula') by which the analysis may be performed. We will put much emphasis of the situation with time-dependent confounding where treatment may change over time and where treatment may, itself, affect future values of the confounder. Other methods for causal inference, including instrumental variables (e.g., used in studies of Mendelian randomization) will also be discussed.

Learning Outcome

On completion of the course, the participants should be able to:


  • List and distinguish between the major Danish health registers, differentiate between their content, and be able to discuss their advantages and disadvantages.
  • Explain the potential problems with drawing causal conclusions from observational studies, list the potential pitfalls, and discuss possible methods to circumvent these
  • Identify and describe the problems caused by censored and truncated data, and how these issues can be alleviated using appropriate statistical models for event history analysis



  • Perform analysis of event history data using statistical models for transition intensities between states.
  • Employ models to create and summarize disease trajectories and identify and visualize patterns in disease trajectories.
  • Apply inverse probability of treatment weighting and the 'g-formula' in statistical analysis to draw causal inference from observational studies
  • Use instrumental variable and Mendelian randomization to estimate causal relationships when controlled experiments are not feasible



  • Discuss the general problem of confounding and its critical impact on inference.
  • Discuss and address the effect of time-dependent variables and confounders, and their influence on analysis and inference.
  • Understand the central aspects of big data analytics and be able to discuss and communicate these to other scientists, clinicians, and the public

Presently undecided. Will be a combination of research papers and textbook.

Participants must meet the admission criteria in BRIDGE - Translational Excellence Programme
5 full days with forum lectures and computer exercises.
Continuous feedback during the course of the semester
Type of assessment
Continuous assessment
Course participation
Attendance and active participation
Exam registration requirements

Participants are automatically registered for the Examination upon course registration.

All aids allowed
Marking scale
passed/not passed
Censorship form
No external censorship
  • Category
  • Hours
  • Lectures
  • 15
  • Theory exercises
  • 15
  • Total
  • 30