NPLK19000U Big Data in Biotechnology

Volume 2019/2020

MSc Programme in Biology-Biotechnology


Many experimental procedures such as the various “-omics” techniques routinely employed within biotechnology produce vast amounts of data. Therefore, the amount of available data in many biotechnological disciplines is steadily increasing. To make use of this wealth of information, knowledge and skills of large-scale computing systems and analysis methods is required. The purpose of this course is to introduce the theory and practice of large-scale data analysis to students, which will allow them to perform and assess different types of large-scale data procedures. Tentative list of data types to be covered in the course: Transcriptomic data (RNAseq), Metabolomic data (LC-MS), and Medical image data.

This course covers the fundamental challenges with analysis of large amounts of data, i.e. big data. This includes how to handle large data files, overcome computational/storage limitations, assess and secure data unification, and provides knowledge and skills to perform data wrangling and normalization. The students will obtain working knowledge of basic data handling, data analysis, and data visualization. Through in-depth focus on the handling and analysis of a relevant set of different data-types using programming-based analysis techniques, this course will address statistical and computational challenges of big-data analysis.

Basic knowledge of the experimental methods used to generate data will also be briefly covered, because an understanding of the experimental methods used to generate data is often needed to assess bias and confounding factors in data.

Learning Outcome

At course completion, the student will have:

Knowledge of

  • The general principles of large-data analysis
  • Common pitfalls in big-data analysis
  • How to interpret and scientifically reflect on big-data analyses
  • The basic concepts underlying clustering and visualization techniques


Skills in

  • How to efficiently keep, move, and analyze data at a low cost
  • How to structure and perform large-scale data analyses in a coding-based software environment, such as for example R or Python
  • Seeking and obtaining big-datasets from data repositories
  • Handling and modifying large datasets
  • Using the most commonly used scientific data analysis pipelines
  • Visualization and dissemination of data


Competences in

  • Critically evaluating big-data analyses
  • Critically evaluating the choice of analysis methods based on a scientific understanding of the fundamental principles of big-data analysis
  • Critically evaluate results obtained via big data analyses

Original literature, software manuals and tutorials, and teacher provided compendia.

Academic qualifications equivalent to a BSc degree is recommended.
Lectures and computer exercises
  • Category
  • Hours
  • Exam
  • 4
  • Lectures
  • 35
  • Practical exercises
  • 50
  • Preparation
  • 107
  • Theory exercises
  • 10
  • Total
  • 206
Continuous feedback during the course of the semester
Peer feedback (Students give each other feedback)

Continuous feedback from teachers at computer exercises and discussion workshops. Feedback from teachers and peers (class) on oral presentations and answers to seminar questions.

7,5 ECTS
Type of assessment
Written examination, 4 hours under invigilation
- - -
Without aids
Marking scale
7-point grading scale
Censorship form
No external censorship

20 minutes oral exam in course curriculum without preparation. No aids allowed.

Criteria for exam assesment

See learning outcome.