Engelsk   Dansk
Velkommen til Københavns Universitets kursuskatalog

NDAK14008U  Programming Massively Parallel Hardware Volume 2014/2015

Course information

Credit7,5 ECTS
LevelFull Degree Master
Duration1 block
Block 1
A (Tues 8-12 + Thurs 8-17)
Course capacityno limit
Continuing and further education
Study boardStudy Board of Mathematics and Computer Science
Contracting department
  • Department of Computer Science
Course responsible
  • Cosmin Eugen Oancea (13-6d797d77737838796b786d6f6b4a6e7338757f386e75)
Alternative course responsible: Sune Darkner
Other potential teachers: Fabian Gieseke, Sarah Niebe
one teacher from APL and one from the image processing group, e.g., Cosmin Oancea, Marek Misztal or Stefan Sommer.
Saved on the 03-12-2014

The aim of the course is to teach students how to efficiently and effectively exploit parallel hardware, which is now mainstream (i.e., how to write quickly programs that run fast).

The course gives an overview of various parallel hardware, such as multi-core, GPU, etc., introduces several programming interfaces, such as OpenMP, OpenCL, etc., and highlights how hardware differences influence the way in which the program is optimized.

The lectures will give practical instructions to implementing, testing, and optimizing/tuning parallel programs written in each of the programming interfaces.

Several composible code transformations that have been found effective in optimizing parallelism are introduced in the second part of the course. These can be seen as recipees for optimizing the application's degree of parallelism and locality of reference.

Learning Outcome



  • explain the main differences in various parallel hardware, and how these influence the way the code is optimized/tuned.
  • explain the (in)correctness of (specific instances of) loop parallelization and related optimizations.
  • explain, on specific instances, how to guide the application of such optimizations, and the (data-sensitive) tradeoffs that are exploited.




  • implement parallel programs using different programming interfaces, such as OpenMP, OpenCL
  • test, profile, and tune the programs to efficiently take advantage of the parallel hardware (multicore, GPU).




  • for a given application and parallel hardware, identify an effective parallelization solution.



The course does not use a textbook, but instead provide tutorials, scientific papers, and selected material from several books (available from the course pages).

Teaching and learning methods
Lectures, in-class exercises, group work on programming and analysis assignments and project.
Academic qualifications
The course syllabus assumes basic knowledge of hardware architecture, compilers, data-structures and algorithms, linear algebra, and most importantly programming competences in C/C++. For example, at DIKU, these can be acquired through the corresponding BSc courses (or through self study).
It is preferable that students have access to GPGPU-programmable hardware.
Sign up
Self Service at KUnet
Credit7,5 ECTS
Type of assessment
Continuous assessment
Four individual assignments (32%), group project (report) with individual presentation (68%).
Marking scale7-point grading scale
Censorship formNo external censorship
several internal examiners
Re-examResubmission of (i) the (missing) assignments (25%) and (ii) the (missing) project extended with additional tasks (50%), and (iii) a 30 minutes oral examination (25%).
Criteria for exam assesment

see learning goals

Project work64
Saved on the 03-12-2014