NDAK15018U Large-Scale Data Analysis (LSDA)
MSc Programme in Computer Science
The information quantity is growing so rapidly that no single person, nor team, can process it in a timely manner. What is needed are machines, tools, and software that can extract insight from these data streams. What society needs are qualified experts capable of designing, developing, and applying such tools. That is, society needs “Data Analysts” and the goal of this course is educating such experts.
In comparison to other courses dealing with machine learning and data analysis, the focus of this course is on the peculiarities of processing large-amounts of data - that is, on Big Data.
The course is relevant for students from among others the studies
of Computer Science, Cognition and IT, Bioinformatics, Physics,
Mathematics, and other areas of quantitative studies.
The course covers a selection of the following (tentative topic)
list:
- Fundamentals of data mining
- Programming paradigms for large-scale data analysis
- Similar item search
- Mining of streaming data
- Identification of frequent itemsets
- Randomized algorithms for large-scale data processing
- Mining social-network graphs
- Online and large-scale machine learning
- Visualization of large and high-dimensional datat sets
At course completion, the successful student will have:
Knowledge of
- the general principles of data mining;
- the theoretical concepts underlying large-scale data analysis;
- common pitfalls in large-scale data analysis.
Skills in
- applying efficient algorithms for analysing large-scale data;
- mining streaming data;
- using programming paradigms for large-scale data analysis;
- visualizing large amounts of data;
- using software tools for large-scale data analysis;
- identifying and handling common pitfalls in data analysis.
Competences in
- recognizing and describing possible applications of large-scale data analysis ("Big Data");
- comparing, appraising and selecting methods for specific data analysis tasks;
- solving large real-world data mining problems.
See Absalon when the course is set up.
Among others, the freely available textbook “Mining of Massive Datasets” by Jure Leskovec, Anand Rajaraman, and Jeff Ullman published by Cambridge University Press will be used.
- Category
- Hours
- Exam
- 20
- Lectures
- 28
- Practical exercises
- 70
- Preparation
- 18
- Theory exercises
- 70
- Total
- 206
As
an exchange, guest and credit student - click here!
Continuing Education - click here!
- Credit
- 7,5 ECTS
- Type of assessment
- Written assignment, 7 daysIndividual, one-week take-home assignment.
- Exam registration requirements
There are four to six mandatory written take-home assignments. They must be passed during the course in order to be eligible for the final exam.
- Aid
- All aids allowed
- Marking scale
- 7-point grading scale
- Censorship form
- No external censorship
Several internal examiners.
- Re-exam
20 minutes oral exam without preparation in full course syllabus.
If student is not qualified then qualification can be achieved by hand-in and approval of equivalent assignments. The assignments must be submitted no later than two weeks before the re-exam date.
Criteria for exam assesment
See learning outcome.
Course information
- Language
- English
- Course code
- NDAK15018U
- Credit
- 7,5 ECTS
- Level
- Full Degree Master
- Duration
- 1 block
- Placement
- Block 4
- Schedule
- A
- Course capacity
- No limit
- Continuing and further education
- Study board
- Study Board of Mathematics and Computer Science
Contracting department
- Department of Computer Science
Course responsibles
- Fabian Cristian Gieseke (fabian.gieseke@di.ku.dk)