NMAK23013U Privacy in Statistics and Machine Learning
MSc Programme in Statistics
In this course we will learn how common anonymisation techniques are open to re-identification attacks and how to describe, quantify, and protect against private data leakage using the framework of differential privacy, a mathematical conception of privacy preservation. Differential privacy may help sharing statistical analyses and models trained on sensitive data without compromising privacy and without great loss of information.
Motivation
Statisticians and data scientists often analyse data that contains sensitive information. Sharing analysis results, aggregate information, or models derived from such sensitive data may compromise individuals' privacy and results in accumulative risk. Nowadays, users can interact with machine learning tools and statistics dashboards that are constantly updated and automatically process vast amounts of data, such as, ubiquitous sensor data, detailed health records, private email and chat correspondences, and photo and video data shared on social media. This brings elevated risk and stresses ethical responsibility to safeguard privacy through privacy-preserving statistical data analysis.
Knowledge of
- definition and interpretation of differential privacy
- privacy attacks on “de-identified” data and statistical data releases, such as, for example, re-identification, reconstruction, or membership attacks
- basic differentially-private algorithms (for example, Laplace mechanism)
Skills in
- identifying and demonstrating risks to privacy in data science settings
- determining privacy guarantees after composition or post-processing
- presenting technical content in writing
Competences to
- implement and present instructive examples of attacks on statistical data privacy
- implement privacy-preserving algorithms and experimentally validate their performance and utility
- effectively communicate and discuss technicalities of differential privacy and practical implications for data science applications
The course literature will be announced on the Absalon course page.
Related text books are, for example,
The Algorithmic Foundations of Differential Privacy by C Dwork
& A Roth
and
The Complexity of Differential Privacy by S Vadhan
(PDFs freely available on the authors'
websites).
Academic qualifications equivalent to a BSc degree in a quantitative field is recommended (such as, for example, BSc in Mathematics/Actuarial Mathematics/Mathematics-Economics, BSc in Machine Learning and Data Science, BSc in Computer Science with suitable specialisation, or similar).
Students should be able to self-reliantly programme basic data simulations and analyses in Python, Julia, or R.
- Category
- Hours
- Lectures
- 28
- Preparation
- 96
- Exercises
- 28
- Project work
- 50
- Exam
- 4
- Total
- 206
- Credit
- 7,5 ECTS
- Type of assessment
- Continuous assessment
- Type of assessment details
- The exam is composed of the following elements to be completed
during the course:
(1) a written summary of one assigned lecture,
(2) between 1 and 3 exercise assignments,
(3) a course project submitted as reproducible code notebook that combines the project report and its code implementations, and
(4) a presentation (of a solution to an assignment question or about the project).
All components need to be approved to pass the course. If a component is not passed, the student must take the re-exam.
Some exam elements are to be completed individually, others in groups of up to three students. - Aid
- All aids allowed
- Marking scale
- passed/not passed
- Censorship form
- No external censorship
One internal examiner
- Re-exam
Passed exam elements can be reused for the re-exam the same year.
All elements that were not approved during the ordinary exam are to be completed individually for the re-exam; they can be based on the original submissions. The student must submit either a summary of a research article or a lecture summary for exam element (1).
If the presentation exam element is not approved during the ordinary exam, the re-exam will include an oral exam of 15 minutes for the presentation to be re-examined.
Criteria for exam assesment
Each of the exam elements must be approved separately to pass the course.
For an an element to be approved, the student must in a satisfactory way demonstrate that they have mastered the learning outcome of the course corresponding to that element.
Course information
- Language
- English
- Course code
- NMAK23013U
- Credit
- 7,5 ECTS
- Level
- Full Degree Master
- Duration
- 1 block
- Placement
- Block 2
- Schedule
- B
- Course capacity
- 50
The number of seats may be reduced in the late registration period.
Study board
- Study Board of Mathematics and Computer Science
Contracting department
- Department of Mathematical Sciences
Contracting faculty
- Faculty of Science
Course Coordinators
- Sebastian Weichwald (10-787c6a6e686d7c667169457266796d33707a336970)