Københavns Universitet - Kurser

NDAK15007U CHANGED: Machine Learning (ML)

Volume 2018/2019

MSc Programme in Computer Science
MSc Programme in Bioinformatics

MSc Programme in Statistics

The amount and complexity of available data is steadily increasing. To make use of this wealth of information, computing systems are needed that turn the data into knowledge. Machine learning is about developing algorithms for analysing data for making predictions, categorizations, and recommendations. Machine learning algorithms are already an integral part of today's computing systems - for example in search engines, recommender systems, or biometrical applications. Machine learning provides a set of tools that are widely applicable for data analysis within a diverse set of problem domains, such as data mining, search engines, digital image and signal analysis, natural language modeling, bioinformatics, physics, economics, biology, etc.

The purpose of the course is to introduce students to the basic theory and most common techniques of statistical machine learning. The students will obtain a working knowledge in statistical machine learning.

This course is relevant for computer science students as well as for students from others studies with sufficient mathematical background and programming skills (e.g., Statistics, Math-Economics, Actuarial Math, Physics, Bioinformatics, …).

The course covers the following tentative topic list:

Foundations of statistical learning.
Occam’s razor bound for generalization performance.
Vapnik-Chervonenkis (VC) analysis of generalization performance.
Classification methods, such as: Linear models, K-Nearest Neighbor, kernel-based methods (e.g., support vector machines), and neural networks.
Linear and non-linear regression methods.
Basic Clustering, dimensionality reduction and visualisation techniques, such as principal component analysis (PCA).

Learning Outcome

At course completion, the successful student will have:

Knowledge of

the general principles of machine learning;
basic probability theory for modeling and analysing data;
the theoretical concepts underlying classification, regression, and clustering;
the mathematical foundations of selected machine learning algorithms;
common pitfalls in machine learning.

Skills in

proving generalization bounds for expected prediction quality;
applying linear and non-linear techniques for classification and regression;
performing elementary dimensionality reduction;
elementary data clustering;
implementing selected machine learning algorithms;
visualising and evaluating results obtained with machine learning techniques;
using software libraries for solving machine learning problems;
identifying and handling common pitfalls in machine learning.

Competences in

recognising and describing possible applications of machine learning;
formalising and rigorously analysing machine learning problems;
comparing, appraising and selecting machine learning methods for specific tasks;
solving real-world data mining and pattern recognition problems by using machine learning techniques.

Literature

See Absalon when the course is set up.

Recommended Academic Qualifications

Knowledge of and experience in programming is required. Participants must be able to implement algorithms described in pseudo code.

Knowledge of linear algebra corresponding to an introductory undergraduate course on the topic is expected (in particular: vector spaces; matrix inversion; eigenvalue decomposition; linear projections). This knowledge can be acquired/refreshed using any introductory book on linear algebra (e.g., Gilbert Strang, "Introduction to Linear Algebra").

Knowledge of basic calculus at an advanced high-school level is also expected (in particular: rules of differentiation; simple integration). This knowledge can be acquired/refreshed using any introductory book on calculus (e.g., Stephen Abbott, "Understanding Analysis"; Michael Spivak, "The Hitchhiker's Guide to Calculus"). There is a free online textbook and course "Calculus" by Gilbert Strang available at MIT OpenCourseWare, http://ocw.mit.edu . The most relevant chapters/sections in this book are 1-3.4, 4.1, 5-6.4, 10, 11, and 13.

Knowledge of basic statistics and probability theory is a plus (in particular: discrete and continuous random variables; independence of random variables and conditional distributions; expectation and variance of random variables; central limit theorem and the law of large numbers). This knowledge can be acquired/refreshed using any introductory book on these topics. We recommend the first four chapters of "Probability and Computing" by Mitzenmacher and Upfal.

Students with weaknesses in one or more of the above areas should check the "Remarks" below or be prepared to spend some extra study time on their own, either before or during the course.

Teaching and learning methods

Lecture and exercise classes.

Remarks

The course is mandatory for Computer Science students. Students from other study programs that do not have the necessary math and programming prerequisites are advised to check the "Introduction to Data Science" course, which teaches basic machine learning, but assumes less background in math and programming.

Workload

Category
Hours
Exam
25
Exam Preparation
25
Lectures
28
Practical exercises
57
Preparation
14
Theory exercises
57
Total
206

Feedback form

Collective

Exam

Credit

7,5 ECTS

Type of assessment

Written assignment, 5 days

CHANGED IN 2018/2019
One written take-home assignment.

Exam registration requirements

There will be five to seven mandatory written take-home assignments.

Exam eligibility is determined two weeks prior to the exam. The exact date is set by the exam office. A student must score above 50% on average in these assignments in order to be eligible for the exam.

Aid

All aids allowed

Marking scale

7-point grading scale

Censorship form

External censorship

Re-exam

The re-exam is a 5-day written take-home assignment.

Prerequisite for participation in the re-exam is handing in the course assignments no later than 2 weeks prior to the re-exam and scoring at least 50% on average in these assignments.

Criteria for exam assesment

See Learning Outcome.

Course information

Language: English
Course code: NDAK15007U
Credit: 7,5 ECTS
Level: Full Degree Master
Duration: 1 block
Placement: Block 2
Schedule: C
Course capacity: No limit
Continuing and further education
Study board: Study Board of Mathematics and Computer Science

Contracting department

Department of Computer Science

Contracting faculty

Faculty of Science

Course Coordinators

Yevgeny Seldin (6-7a6c736b7075476b7035727c356b72)

Lecturers

Christian Igel

Saved on the 01-04-2019

Tilbage