AØKK08371U Topics in Social Data Science CANCELED due to two new courses
MSc programme in Economics – elective course.
Bacheloruddannelsen i økonomi – valgfag på 3. år
The Danish BSc programme in Economics - elective at the 3rd year
NOTE: The course has been changed to two new courses:
- Social Data Science: Econometrics and Machine Learning (AØKK08400U)
- Social Data Science: Text Data and Deep Learning (AØKK08401U)
The objective of this course is to teach students how to leverage the data science toolbox for use in social science. We emphasize the use of new data sources associated with communication, behavior, transactions, etc., which are increasingly available through the web and by collection from the various devices we use. These new sources of structured and unstructured data allow for testing and validation of existing theories in social science as well as development of new ones. Performing these analyses, however, requires an ability to understand and apply computational methods.
In this course, we build on the introductory course in social data science (AØKK08216U). We introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques. We do this by practical examples and provide students with hands-on experience. We also build on the machine learning techniques from the introductory course. We introduce advanced techniques including ensemble learning and deep learning. We discuss how social science leverage these tools and the increasing role they might play.
The first canonical data structure we introduce is networks and relational data. Networks are essential for representing systems of interaction such as information transmission, social behavior as well as for risk in the interbank markets. We then introduce spatial data, for representing locations, shapes, and boundaries within the built environment, as well as mobility traces. These data increasingly play a role in sociology, economics and political science. Finally, we cover text as data, which is unarguably the most abundantly and readily available data source in the form of news articles, speeches, forum threads, social media posts, encyclopedia, etc.
After completing the course, the student should be able to:
Account for the structure of complex networks and understand modeling of social relations based on network statistics like node degree and centrality measures.
Understand fundamental concepts in machine learning: model generalization, overfitting, loss functions, the bias variance trade-off and cross-validation.
Account for various learning strategies, algorithms as well as approaches: clustering and unsupervised learning, supervised learning, semi-supervised learning, transfer learning, multi-task learning.
Know spatial data structures and shapes including points, lines and polygons and account for the choice of coordinate system.
Understand the potential of different representations of text: structured and unstructured, graph-based, and latent representations.
Comprehend how network, spatial and text data as well as machine learning can be applied in the social sciences.
Apply fundamental machine learning tools, including model selection, hyperparameter search and robust model validation. We specifically require an ability to estimate work with ensemble learning and artificial neural networks.
Extract reliable information from text data using supervised learning and techniques from natural language processing.
Structure spatial data for analysis by manipulating shapes, compute local network statistics and spatially combining various sources.
Compute network measures including centrality, clustering, sorting as well as contagions effects.
Integrate theoretical and applied knowledge within the field of Data Science and formulate powerful research questions given an interesting dataset.
Communicate results using comprehensive statistics and modern visualization methods in particular plotting new data types.
Critically evaluate the implications of results, taking into account model limitations and biases, and systematic noise introduced by data collection and sampling methods.
- Bishop, Christopher: Pattern Recognition and Machine Learning. Web book available free at https://www.microsoft.com/en-us/research/people/cmbishop/#!prml-book. Spring Publishing, 2006.
- Raschka, Sebastian, and Vahid Mirjalili. Python for Machine Learning, 2nd Ed. Packt Publishing, 2017.
- Barabási, Albert-László. Network science. Web book avaialable free at http://barabasi.com/networksciencebook/. Cambridge university press, 2016.
- Gimond, Manuel. Intro to GIS and Spatial Analysis. Web book available free at https://mgimond.github.io/Spatial/index.html. Preprint, 2017.
- Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
(The list is changed 21-1-2019)
All students are expected to have strong skills in Python for data science as we begin the course. These skills should include data structuring (pandas, numpy), visualization skills (matplotlib, Seaborn), collecting data by scraping (requests, regular expressions, BeautifulSoup/Scrapy) and finally machine learning fundamental (sklearn). Note if you come from an R background you should have no problem making the transition but preparation is required!
- Class Instruction
The course has been canceled.
- 7,5 ECTS
- Type of assessment
- Written assignment, 3 weeksproject exam. It is allowed to work in groups of 3 to 4 participants. The plagiarism rules must be complied and please be aware of the rules for co-writing assignments.
The project paper must be written in English.
- Exam registration requirements
To be eligible for the exam the project description and four of five mandatory assignments must be approved.
- All aids allowed
- Marking scale
- 7-point grading scale
- Censorship form
- No external censorship
for the written exam. The exam may be chosen for external censorship by random check.
- Exam period
Please contact the study administration if you have not taken the exam.
Note: In special cases, the dates may be changed, which will be informed.
Further information about the exam will be available in Digital Exam from the middle of the semester.
Please contact the study administration if you have not taken the exam.
Note: In special cases, the written reexam can change to another day within the reexam period. Or to an oral exam incl. date, time and place, if only a few students are registered. This will be informed by the Exam Office.
Criteria for exam assesment
Students are assessed on the extent to which they master the learning outcome for the course.
To receive the top grade, the student must with no or only a few minor weaknesses be able to demonstrate an excellent performance displaying a high level of command of all aspects of the relevant material and can make use of the knowledge, skills and competencies listed in the learning outcomes.
In particular, the student should in this course be able to independently analyze new data sets using the tools and theories covered in the course.