ASTK18463U Machine Learning for Social Scientists

Volume 2024/2025
Education

Full-degree students enrolled at the Department of Political Science, UCPH

  • MSc in Political Science
  • MSc in Social Science
  • MSc in Security Risk Management
  • Bachelor in Political Science

 

Full-degree students enrolled at the Faculty of Social Science, UCPH

  • To be informed

 

The course is open to:

  • Exchange and Guest students from abroad
  • Credit students from Danish Universities
  • Open University students

 

The course is scheduled 2 hours weekly. The classroom is booked for 2 hours after lecture, for practicing without lecturer.

Content

This class teaches supervised machine learning methods, i.e. statistical methods for prediction and description tasks. We will focus on methods that are relevant to social science, especially ones using text as data. The programming language used is R.

 

We will cover (1) text-as-data skills such as preprocessing, TF-IDF, dictionary methods and keyword selection; (2) basics of supervised learning such as performance metrics, train-test splits, cross-validation and regularization; (3) prediction methods such as logistic regression, softmax regression, Naive Bayes models and support vector machines, plus scaling methods like item-response theory models; (4) neural networks and their architecture, including deep neural nets, word embedding models and transformers; and finally (5) applications of neural nets to text-as-data (large language models, text classification).

Learning Outcome

Knowledge:

Understand the possible models for supervised learning tasks relevant to social sciences, their downsides and upsides depending on the task, and their inner workings.

Skills:

Be able to process data for use by these models, run the models themselves, interpret results, and understand the necessary software and hardware requirements.

Competences:

Be a critical user of supervised machine learning methods for prediction and description problems.

Literature

Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. Text as data: A new framework for machine learning and the social sciences. Princeton University Press, 2022.

Students must be proficient in R and have an understanding of statistical concepts like OLS, model fitting, probability and distributions.
We will use a combination of lectures and exercise sets to be completed independently.
  • Category
  • Hours
  • Class Instruction
  • 28
  • Total
  • 28
Collective
Feedback by final exam (In addition to the grade)
Credit
7,5 ECTS
Type of assessment
Oral examination
Type of assessment details
Mundtlig eksamen MED forberedelse
Marking scale
7-point grading scale
Censorship form
No external censorship
Re-exam

- In the semester where the course takes place: Mundtlig eksamen MED forberedelse

- In subsequent semesters: Free written assignment

Criteria for exam assesment

Grade 12 is given for an outstanding performance: the student lives up to the course's goal description in an independent and convincing manner with no or few and minor shortcomings

Grade 7 is given for a good performance: the student is confidently able to live up to the goal description, albeit with several shortcomings

Grade 02 is given for an adequate performance: the minimum acceptable performance in which the student is only able to live up to the goal description in an insecure and incomplete manner