NSCPHD1150 Bioinformatics methods for analysis of high-throughput sequencing data

Volume 2016/2017

  2. PLEASE NOTE         

    The PhD course database is under construction. If you want to sign up for this course, please click on the link in order to be re-directed. Link: https://phdcourses.ku.dk/nat.aspx


  4. Scientific content:

  5. 1. Overview of high-throughput sequencing technology (Dr Xun Xu, Executive Director, BGI Research)

The chemistry of the state-of-the-art sequencing technologies including Illumina, Complete Genomics, Ion Torrent, PacBio and Oxford Nanopore will be introduced. Subsequently, the approaches to analyze and understand the error profile of each of the sequencing technology will be present. This familiarizes the students with the sequencing technologies which are widely applied in a broad variety of academic and industry research areas.


  1. 2. Genome Assembly ( Professor Li Shuaicheng, City University of HongKong )

This section will introduce three major genome de novo assembly strategies including Overlap-Layout-Consensus, the de bruijn graph and the string graph approach adapted for the evolution of sequencing technology. The students will practice on how to assemble and visualize a yeast genome and a human chromosome.

  1. 3. Genome Annotation ( Professor Anders Krogh, University of Copenhagen )

Annotation is a fundamental step to understand a species genome and the impacts of the genetic variations. Provided the de novo assembled genome in the previous section, the students will learn the essential principles how to identify the genes and distinguish different categories of repeats from the sequence.

  1. 4. Methods and algorithms for mapping of high-throughput DNA sequencing reads ( Professor Anders Krogh, University of Copenhagen )

Since there are still computational challenges for de novo assembly of large genomes, mapping reads to the species reference is an essential step for sequencing data analysis. In this section, the students will learn and understand the principles on sequence alignments considering the read length, error features of different sequencing technology.

  1. 5. Variation calling framework from high-throughput sequencing data ( Research Scientist, Heng Li, Broad Institute )

    Continuously improved variation calling with higher sensitivity and specificity from the sequencing data accommodating the technical uncertainty turns out to be an eternal theme in the field. There has been a statistical framework first introduced by Heng in 2011 for SNP calling from sequencing data and were further developed afterwards. There are also algorithms created and developed for INDEL and structural variation calling from sequencing data.  Specific approaches are also required for better performance in various application circumstances such as cancer sequencing analysis, prenatal genetic sequencing data, family sequencing data, etc. In this section, the essential algorithms will be illustrated accompanied with the research talks and exercises.


  2. 6. Haplotype imputation of genotypes ( Professor Anders Albrechtsen, University of Copenhagen )

    For the diploid genome species like homo sapiens, genotypes predicted in the above section is not a sufficient presentation of the genetic information. In the population genetics and disease mapping studies, haplotype imputation is a necessary analysis technique. In this section, we will introduce the students the algorithms and approaches to impute the haplotype sequence from called variants.


  1. 7. Population-based methodologies: Correlating the genotypes to the phenotypes accommodating the technical uncertainty ( Professor Anders Albrechtsen, University of Copenhagen )

    Finally, after the construction of the haplotype sequences, we move on to an advanced topic on the computational approaches to discover and understand the genetic causes of the disease. The students will learn about the statistical framework for disease association mapping from sequencing data.



Learning Outcome

Via this course, the student will acquire knowledge of:

  • The basic chemistry and error profiles of various widely used sequencing technologies

  • De novo assembly algorithms and practical tasks

  • The computational methodologies behind genome annotation

  • Principles of alignment and mapping of DNA read with different length

  • The statistical framework for calling genomic variations from the sequencing data

  • The approach to impute haplotypes from the genotypes

  • The statistical framework for disease gene mapping from sequencing technology

A list of preparatory reading (text book materials and scientific articles) will be compiled and sent to course participants one month in advance of the course.

Supplementary reading will be announced on the course.

Course participants must be graduate students or have a masters degree in the area of computer science, bioinformatics, genomics, biostatistics or similar.
Besides class-room lectures, there will be relevant research talks and practical individual and Group exercises during the course to enhance the students’ comprehensions and applications of the bioinformatics approaches.
  • Category
  • Hours
  • Class Instruction
  • 50
  • Preparation
  • 76
  • Project work
  • 80
  • Total
  • 206
7,5 ECTS
Type of assessment
Written assignment, 80 hours
Within two weeks after the course, the course participants will hand in an assignment on a topic that has been settledone month following
All aids allowed
Marking scale
completed/not completed