# NMAK23013U Privacy in Statistics and Machine Learning

Volume 2023/2024
Education

MSc Programme in Statistics

Content

In this course we will learn how common anonymisation techniques are open to re-identification attacks and how to describe, quantify, and protect against private data leakage using the framework of differential privacy, a mathematical conception of privacy preservation. Differential privacy may help sharing statistical analyses and models trained on sensitive data without compromising privacy and without great loss of information.

Motivation

Statisticians and data scientists often analyse data that contains sensitive information. Sharing analysis results, aggregate information, or models derived from such sensitive data may compromise individuals' privacy and results in accumulative risk. Nowadays, users can interact with machine learning tools and statistics dashboards that are constantly updated and automatically process vast amounts of data, such as, ubiquitous sensor data, detailed health records, private email and chat correspondences, and photo and video data shared on social media. This brings elevated risk and stresses ethical responsibility to safeguard privacy through privacy-preserving statistical data analysis.

Learning Outcome

Knowledge of

• definition and interpretation of differential privacy
• privacy attacks on “de-identified” data and statistical data releases, such as, for example, re-identification, reconstruction, or membership attacks
• basic differentially-private algorithms (for example, Laplace mechanism)

Skills in

• identifying and demonstrating risks to privacy in data science settings
• determining privacy guarantees after composition or post-processing
• presenting technical content in writing

Competences to

• implement and present instructive examples of attacks on statistical data privacy
• implement privacy-preserving algorithms and experimentally validate their performance and utility
• effectively communicate and discuss technicalities of differential privacy and practical implications for data science applications

The course literature will be announced on the Absalon course page.

Related text books are, for example,
The Algorithmic Foundations of Differential Privacy by C Dwork & A Roth
and
The Complexity of Differential Privacy by S Vadhan
(PDFs freely available on the authors' websites).

Students should have a solid grounding in probability and statistics, linear algebra, vector calculus, and algorithms. Students should be comfortable reading and writing mathematical proofs involving algorithms and probability. Examples for courses: StatMet/MStat/MatStat, LinAlgMat/LinAlgDat, Sand, ModComp, or similar courses.

Academic qualifications equivalent to a BSc degree in a quantitative field is recommended (such as, for example, BSc in Mathematics/Actuarial Mathematics/​Mathematics-Economics, BSc in Machine Learning and Data Science, BSc in Computer Science with suitable specialisation, or similar).

Students should be able to self-reliantly programme basic data simulations and analyses in Python, Julia, or R.
Lectures, in-class exercises, exercise classes and TA sessions for work on assignments and projects with written hand-ins, code notebook and report hand-ins, and student presentations.
• Category
• Hours
• Lectures
• 28
• Preparation
• 96
• Exercises
• 28
• Project work
• 50
• Exam
• 4
• Total
• 206
Continuous feedback during the course of the semester
Credit
7,5 ECTS
Type of assessment
Continuous assessment
Type of assessment details
The exam is composed of the following elements to be completed during the course:

(1) a written summary of one assigned lecture,

(2) between 1 and 3 exercise assignments,

(3) a course project submitted as reproducible code notebook that combines the project report and its code implementations, and

(4) a presentation (of a solution to an assignment question or about the project).

All components need to be approved to pass the course. If a component is not passed, the student must take the re-exam.
Some exam elements are to be completed individually, others in groups of up to three students.
Aid
All aids allowed
Marking scale
passed/not passed
Censorship form
No external censorship
One internal examiner
Re-exam

Passed exam elements can be reused for the re-exam the same year.

All elements that were not approved during the ordinary exam are to be completed individually for the re-exam; they can be based on the original submissions. The student must submit either a summary of a research article or a lecture summary for exam element (1).

If the presentation exam element is not approved during the ordinary exam, the re-exam will include an oral exam of 15 minutes for the presentation to be re-examined.

##### Criteria for exam assesment

Each of the exam elements must be approved separately to pass the course.

For an an element to be approved, the student must in a satisfactory way demonstrate that they have mastered the learning outcome of the course corresponding to that element.