Københavns Universitet - Kurser

NDAK15018U Large-Scale Data Analysis (LSDA)

Volume 2017/2018

MSc Programme in Computer Science

Almost every scientific and industrial field is nowadays faced with massive amounts of data and the field of large-scale data science has become one of the key drivers for data-intensive research and innovation. Taking advantage of this data-rich situation often requires specialized approaches, tools, and skills. What society needs are qualified experts capable of designing, developing, and applying data analysis techniques in the context of big data scenarios. That is, “Data Analytists” are needed and the goal of this course is to educate such experts.

In comparison to other courses dealing with machine learning or data analysis, the focus of this course is on the peculiarities of processing large amounts of data - that is, on Big Data.

The course is relevant for students from, among others, the studies of Computer Science, Cognition and IT, Bioinformatics, Physics, Statistics, and other areas of quantitative studies.

The course covers a selection of the following (tentative topic) list:

Fundamentals of data mining
Online and large-scale machine learning
Programming paradigms for large-scale data analysis
Mining of streaming data
Data analysis on (massively-)parallel platforms

Learning Outcome

At course completion, the successful student will have:

Knowledge of

the general principles of data mining;
the theoretical concepts underlying large-scale data analysis;
common pitfalls in large-scale data analysis.

Skills in

applying efficient algorithms for analyzing large-scale data sets;
using programming paradigms for large-scale data analysis;
using software tools for large-scale data analysis;
identifying and handling common pitfalls in data analysis.

Competences in

recognizing and describing possible applications of large-scale data analysis ("Big Data");
comparing, appraising and selecting methods for specific data analysis tasks;
solving large real-world data mining problems.

Literature

See Absalon when the course is set up.

Among others, the freely available textbook “Mining of Massive Datasets” by Jure Leskovec, Anand Rajaraman, and Jeff Ullman published by Cambridge University Press will be used.

Recommended Academic Qualifications

Participants should have passed the course "Machine Learning" or similar. Knowledge of basic calculus and statistics is required. Participants should also have knowledge of basic programming and programming languages (in particular Python) or should be willing to spend some extra study time to get familiar with the required programming skills.

Teaching and learning methods

Lecture and exercise classes

Workload

Category
Hours
Exam
20
Lectures
28
Practical exercises
70
Preparation
18
Theory exercises
70
Total
206

Exam

Credit

7,5 ECTS

Type of assessment

Continuous assessment

4-6 weekly take-home exercises.
The final grade will be the average over all assignments except the one with the lowest score.

Aid

All aids allowed

Marking scale

7-point grading scale

Censorship form

No external censorship

Several internal examiners.

Re-exam

20 minutes oral exam without preparation in full course syllabus.

If student is not qualified then qualification can be achieved by hand-in and approval of equivalent assignments. The assignments must be submitted no later than two weeks before the re-exam date.

Criteria for exam assesment

See Learning Outcome.

Course information

Language: English
Course code: NDAK15018U
Credit: 7,5 ECTS
Level: Full Degree Master
Duration: 1 block
Placement: Block 4
Schedule: A
Course capacity: No limit
Continuing and further education
Study board: Study Board of Mathematics and Computer Science

Contracting department

Department of Computer Science

Course Coordinators

Fabian Cristian Gieseke (14-726d6e756d7a3a7375717f7177714c70753a77813a7077)

Saved on the 08-03-2017

Tilbage