AØKK08371U Advanced Social Data Science
The objective of this course is to teach students how to leverage the data science toolbox for use in social science. We emphasize the use of new data sources associated with communication, behavior, transactions, etc., which are increasingly available through the web and by collection from the various devices we use. These new sources of structured and unstructured data allow for testing and validation of existing theories in social science as well as development of new ones. Performing these analyses, however, requires an ability to understand and apply methods from the computational sciences. We build on the foundational course in social data science to teach these fundamental skills.
We introduce students to the essentials of data structure and structuring and teach state of the art methods for applying data science and machine learning techniques. We do this by using practical examples and provide students with hands-on experience. We will build on the knowledge from the basic Social Data Science course.
The first canonical data structure we introduce is network and relational data. This data type is ubiquitous when analyzing data from social media, communication on cell phones or data on physical meetings. The second data type is spatial data which includes data on shape and structure of shops, buildings, administrative boundaries, etc. but also includes personal data from GPS on smartphones, cars and much more. The final data type is text data which is present everywhere as documents, online discussions etc. For each of the three datatypes we will teach various tools to work with them in practice.
We teach students a high level of applied machine learning. We will provide an in-depth review of the advantages and disadvantages of standard machine learning techniques, i.e. supervised machine learning (regression, classification) and unsupervised learning. In addition we will teach tools from the frontier of applied social data science that leverages machine learning for causal inference.
The teaching is built around empirical examples: the course aims at developing good practices in data analysis, including thorough exploratory analysis, reliable collection and cleaning of data, visualization skills and statistical sensitivity analysis.
The course will emphasize a complete approach to working with data - from data collection - over data structuring (i.e. parsing, cleaning, transformation, and merging) - to exploratory analysis, and finally reporting of the results.
After completing the course, the student should be able to:
Account for the structure of complex networks and understand modeling of social relations based on network statistics like node degree and centrality measures.
Understand fundamental concepts in machine learning: model generalization, overfitting, loss functions, the bias variance trade-off and cross-validation.
Account for various learning strategies, algorithms as well as approaches: clustering and unsupervised learning, supervised learning, semi-supervised learning, transfer learning, multi-task learning.
Define spatial data using shapes including points, lines and polygons and account for the choice of coordinate system.
Understand the potential of different representations of text: structured and unstructured,graph-based, and latent representations.
Gather, structure, and prepare data for analysis.
Select an appropriate modeling approach for analyzing a given dataset: apply model selection, hyperparameter search and robust model validation. Analyze the statistical power of model parameters and limits of your current training sample and choice of representation.
Extract reliable information from text data using supervised learning and techniques from natural language processing.
Structure geodata for analysis by manipulating shapes, compute local network structures and spatially combining various sources.
Communicate results using comprehensive statistics and modern visualization methods.
Integrate theoretical and applied knowledge within the field of Data Science and formulate powerful research questions given an interesting dataset.
Combine learned methods to address research questions involving large scale social data and machine learning.
Choose the appropriate tools to increase performance of computation.
Critically evaluate the implications of results, taking into account model limitations and biases, and systematic noise introduced by data collection and sampling methods.
Barabási, Albert-László. Network science. Web book avaialable free at http://barabasi.com/networksciencebook/. Cambridge university press, 2016.
Gimond, Manuel. Intro to GIS and Spatial Analysis. Web book available free at https://mgimond.github.io/Spatial/index.html. Preprint, 2017.
Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
Bender, Emily M. "Linguistic fundamentals for natural language processing: 100 essentials from morphology and syntax." Synthesis Lectures on Human Language Technologies 6.3 (2013)
Farzindar, Atefeh, and Diana Inkpen. "Natural language processing for social media." Synthesis Lectures on Human Language Technologies 8.2 (2015)
Søgaard, Anders. "Semi-supervised learning and domain adaptation in natural language processing." Synthesis Lectures on Human Language Technologies 6.2 (2013)
Friedman, J., Hastie T., and R. Tibshirani. Elements of statistical learning. Second edition, 12th printing. Web book available free at https://web.stanford.edu/~hastie/ElemStatLearn/. Springer, 2017.
Raschka, Sebastian, and Vahid Mirjalili. Python Machine Learning, 2nd Ed. Packt Publishing, 2017.
Athey, Susan and Guido Imbens. Recursive partitioning for heterogeneous causal effects. Proceedings in National Academy of Sciences, 2017.
4 hours lectures combined with exercises once a week from week 6 to 21 (except holidays).
The overall schema for the Master can be seen at
Timetable and venue:
To see the time and location of lectures and exercise classes please press the link/links under "Se skema" (See schedule) at the right side of this page (E means Autumn, F means Spring). The lectures is shown in each link.
You can find the similar information partly in English at
-Select Department: “2200-Økonomisk Institut” (and wait for respond)
-Select Module:: “2200-F18; [Name of course]”
-Select Report Type: “List – Weekdays”
-Select Period: “Forår/Spring – Week 5-30”
Press: “ View Timetable”
Registration and information for foreign students not enrolled please find more information at Study Economics.
Læs om uddannelsen og studieordningen på KA uddannelsen i økonomi.
- 7,5 ECTS
- Type of assessment
- Written assignment, 3 weeksproject exam. It is allowed to work in groups of 3 to 4 participants. The plagiarism rules must be complied and please be aware of the rules for co-writing assignments.
The project paper must be written in English.
- Exam registration requirements
To be eligible for the exam the project description and two mandatory assignments must be approved.
- All aids allowed
- Marking scale
- 7-point grading scale
- Censorship form
- No external censorship
The course can be selected for external assessment.
- Exam period
Deadline for uploading the project paper to DE: May 28, 2018 at 10 a.m.
Deadline for the project description: April 30, 2018 at 10 a.m.
Deadline for uploading the project paper to DE: August 24, 2018 at 10 a.m.
If only a few students have registered for the written re-exam, the reexam might change to an oral exam including the date, time and venue for the exam, which will be informed by the Examination Office.
Criteria for exam assesment
Students are assessed on the extent to which they master the learning outcome for the course.
To receive the top grade, the student must with no or only a few minor weaknesses be able to demonstrate an excellent performance displaying a high level of command of all aspects of the relevant material and can make use of the knowledge, skills and competencies listed in the learning outcomes.
In particular, the student should be able to independently analyze new data sets using the tools and theories covered in the course. This includes construction of VAR model for the data and a discussion and testing of the underlying assumptions. Determination of the cointegration properties. Formulation and test of relevant hypotheses on the cointegrating relations and the short-term adjustment. Be able to analyze models for data integrated of order two.
- Class Instruction