NDAB21005U Machine Learning A (MLA)

Volume 2021/2022
Education

Bsc programme in Machine Learning and Data Science

Content

The course introduces basic theory and algorithms of machine learning. The course covers the following tentative list of topics:

  • Supervised learning setting
    • Classification
    • Regression
  • Unsupervised learning setting
  • Concentration of measure inequalities
    • Markov's
    • Chebyshev's
    • Hoeffding's
  • Analysis of generalization in classification
    • Validation and cross-validation
    • Generalisation bound for a single hypothesis
    • Generalisation bound for a finite hypothesis class
    • Occam's razor - generalisation bound for a countably infinite hypothesis class
  • Algorithms
    • K-Nearest Neighbors
    • Perceptron
    • Logistic Regression
    • Linear Regression
    • Feature transformations and classification/regression in transformed feature spaces
    • Various forms of regularisation
      • Regularization terms
      • Dimensionality reduction
    • Random Forests and Decision Trees
    • Neural Networks and introduction to Deep Learning
    • Principal Component Analysis (PCA)
    • Clustering
  • Assumptions behind the algorithms taught in the course, their implications, and common pitfalls
    • Overfitting
      • Internal overfitting within algorithms due to overly complex hypothesis spaces
      • Extrenal overfitting outside algorithms due to application of an excessive number of algorithms to a dataset
    • The i.i.d. assumption
      • The i.i.d. assumption is behind everything taught in the course
      • Consequences of violation of the i.i.d. assumption
        • Special case: sampling bias
        • Failure of generalisation guarantees
      • Implications of the i.i.d. assumption
        • Biases in the training data propagate into predictions
    • Correlation ≠ Causality
      • The course only studies statistical correlations / dependencies in the data. Causal inference is not covered in the course.
Learning Outcome

At course completion, the successful student will have:

Knowledge of

  • the basic principles of machine learning;
  • basic probability theory for modelling and analysing data;
  • the theoretical concepts underlying classification, regression, and clustering;
  • the mathematical foundations of selected machine learning algorithms;
  • basic assumptions behind the algorithms studied in the course, their implications and common pitfalls.

 

Skills in

  • proving generalisation bounds based on validation errors;
  • proving generalisation bounds for countable hypothesis classes;
  • applying linear and non-linear techniques for classification and regression;
  • performing elementary dimensionality reduction;
  • elementary data clustering;
  • implementing selected machine learning algorithms;
  • visualising and evaluating results obtained with machine learning techniques;
  • using software libraries for solving machine learning problems;
  • identifying and handling common pitfalls in machine learning.

 

Competences in

  • recognising and describing possible applications of machine learning;
  • formalising and rigorously analysing machine learning problems;
  • comparing, appraising and selecting machine learning methods for specific tasks;
  • solving real-world data mining and pattern recognition problems by using machine learning techniques.

Will be published on Absalon.

1. Knowledge of Linear Algebra corresponding to Lineær algebra i datalogi course (LinAlgDat)

2. Knowledge of Calculus corresponding to Introduktion til matematik i naturvidenskab (MatintroNat) or Matematisk analyse og sandsynlighedsteori i datalogi (MASD).

3.Knowledge of Probability Theory corresponding to Sandsynligheds-regning og statistik (SS), Grundlæggende statistik og sandsynlighedsregning (GSS) or Matematisk analyse og sandsynlighedsteori i datalogi og Modelling analysis of data (MAD).

4.Knowledge of Discrete Mathematics corresponding to Diskret matematik og formelle sprog (DMFS) or Diskret Matematik og algoritmer (DMA).

5. Knowledge of programming in Python corresponding to Programmering og problemløsning (PoP).
Weekly lectures, weekly home assignments, exercise sessions
The course is identical to approximately 50% of NDAB20000U Introduktion til Machine Learning (IntroML)
It is not allowed to pass both this course and the Introduktion til Machine Learning (IntroML).
  • Category
  • Hours
  • Lectures
  • 34
  • Preparation
  • 8
  • Theory exercises
  • 57
  • Practical exercises
  • 57
  • Exam Preparation
  • 25
  • Exam
  • 25
  • Total
  • 206
Written
Oral
Individual
Collective
Continuous feedback during the course of the semester
Credit
7,5 ECTS
Type of assessment
Written examination, 5 days
The exam is a 5-day written take-home assignment (must be solved individually).
Exam registration requirements

5-7 mandatory written take-home assignments (must be solved individually).

A student must score above 50% on average in the assignments in order to qualify for the exam.

Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
External censorship
Re-exam

The re-exam is a 5-day written take-home assignment (must be solved individually).

Prerequisite for participation in the re-exam is handing in the course assignments no later than 3 weeks prior to the re-exam week and scoring at least 50% on average in these assignments.

Criteria for exam assesment

See Learning Outcome.