NDAK14008U  Programming Massively Parallel Hardware

Volume 2014/2015

The aim of the course is to teach students how to efficiently and effectively exploit parallel hardware, which is now mainstream (i.e., how to write quickly programs that run fast).

The course gives an overview of various parallel hardware, such as multi-core, GPU, etc., introduces several programming interfaces, such as OpenMP, OpenCL, etc., and highlights how hardware differences influence the way in which the program is optimized.

The lectures will give practical instructions to implementing, testing, and optimizing/tuning parallel programs written in each of the programming interfaces.

Several composible code transformations that have been found effective in optimizing parallelism are introduced in the second part of the course. These can be seen as recipees for optimizing the application's degree of parallelism and locality of reference.

Learning Outcome



  • explain the main differences in various parallel hardware, and how these influence the way the code is optimized/tuned.
  • explain the (in)correctness of (specific instances of) loop parallelization and related optimizations.
  • explain, on specific instances, how to guide the application of such optimizations, and the (data-sensitive) tradeoffs that are exploited.




  • implement parallel programs using different programming interfaces, such as OpenMP, OpenCL
  • test, profile, and tune the programs to efficiently take advantage of the parallel hardware (multicore, GPU).




  • for a given application and parallel hardware, identify an effective parallelization solution.


The course does not use a textbook, but instead provide tutorials, scientific papers, and selected material from several books (available from the course pages).

The course syllabus assumes basic knowledge of hardware architecture, compilers, data-structures and algorithms, linear algebra, and most importantly programming competences in C/C++. For example, at DIKU, these can be acquired through the corresponding BSc courses (or through self study).
Lectures, in-class exercises, group work on programming and analysis assignments and project.
It is preferable that students have access to GPGPU-programmable hardware.
7,5 ECTS
Type of assessment
Continuous assessment
Four individual assignments (32%), group project (report) with individual presentation (68%).
Marking scale
7-point grading scale
Censorship form
No external censorship
several internal examiners
Resubmission of (i) the (missing) assignments (25%) and (ii) the (missing) project extended with additional tasks (50%), and (iii) a 30 minutes oral examination (25%).
Criteria for exam assesment

see learning goals

  • Category
  • Hours
  • Lectures
  • 32
  • Preparation
  • 48
  • Exercises
  • 61
  • Project work
  • 64
  • Exam
  • 1
  • Total
  • 206