AØKK08401U Social Data Science: Text Data and Deep Learning
MSc programme in Economics – elective course.
The PhD Programme in Economics at the Department of Economics - elective course with research module (PhD students must contact the study administration and the lecturer in order to write the research assignment)
NOTE: Due to an overlapping syllabus this course cannot be taken if the course "Topics in Social Data Science" (AØKK08371U) has been taken.
"Social Data Science: Text Data and Deep Learning" is one of two new courses in Social Data Science, that build on the introductory summer school course in social data science. The courses introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques by practical examples and hands-on experience. In each course, we discuss how novel social data science applications apply these tools.
"Social Data Science: Text Data and Deep Learning" focuses on methods for analyzing unstructured data. Unstructured data such as images, video and text used to be confined to small N qualitative studies within the social sciences. How ever, recent developments in both natural language processing (NLP) and computer vision (CV) - broadly speaking the field of AI - hold great promises to social data scientists wishing to supplement deep qualitative readings and analysis of unstructured data, with quantitative insights and generalization from large corpuses of unstructured text and images.
The course begins with an introduction to neural networks and transfer learning. In many cases involving unstructured data, the high dimensionality of both language and vision means that the old supervised learning paradigm of training models from scratch using limited instructive samples (training data) is either impossible or very inefficient. In these cases, transfer learning can be used to adopt large pre-trained models, trained on very large labeled or unlabeled datasets, to a specific task. Next, we cover text as data, an abundantly and readily available data source in the form of news articles, speeches, forum threads, social media posts, encyclopedia, et cetera. Lastly, the course introduces methods for using digital images as data.
After completing the course, the student is expected to be able to:
- Discuss fundamental concepts in machine learning: model generalization, overfitting, loss functions, the bias variance trade-off and cross-validation.
- Account for various learning strategies, algorithms as w ell as approaches: clustering and unsupervised learning, supervised learning, semi-supervised learning, transfer learning, multi-task learning.
- Identify and define the potential of different representations of text, structured and unstructured.
- Apply fundamental machine learning tools, including model selection, hyperparameter search and robust model validation.
- Use neural networks to make predictions from unstructured data.
- Extract reliable information from text data using supervised learning and techniques from natural language processing.
- Master computer vision methods to extract features from image data.
- Integrate theoretical and applied know ledge within the field of Social Data Science and formulate powerful research questions given an interesting dataset.
- Construct validated and documented data sets for social science from unstructured text and media data.
- Communicate results using comprehensive statistics and modern visualization methods in particular plotting new data types.
- Critically evaluate the implications of results, taking into account model limitations and biases, and systematic noise introduced by data collection and sampling methods.
The follow ing is a partial, tentative list of course readings.
- Bishop, Christopher: Pattern Recognition and Machine Learning. Spring Publishing, 2006.
- Cantu, Francisco & Michelle Torres: "Learning to See: Visual Analysis for Social Science Data".
- Gentzkow , M., Kelly, B. T., & Taddy, M. Text as Data. Journal of Economic Literature.
- Grimmer, J., & Stew art, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political
- texts. Political Analysis, 21(3), 267-297.
- Hastie, T., & Tibshirani, R. & Friedman, J.(2008). The Elements of Statistical Learning; Data Mining, Inference and Prediction.
- Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
- Ability to write code in core Python programming as well as the numpy and Pandas packages, including transforming, merging, aggregating.
- Experience with training linear machine learning models, model validation and model selection.
- Experience with transforming data to data for machine learning.
3 hours lectures once a week from week 6 to 20 (except holidays)
2 hours exercise classes once a week from week 6/7 to 20/21 (except holidays)
The overall schema for the BA 3rd year and Master courses can be seen at KUnet:
MSc in Economics => "courses and teaching" => "Planning and overview" => "Your timetable"
BA i Økonomi/KA i Økonomi => "Kurser og undervisning" => "Planlægning og overblik" => "Dit skema"
Timetable and venue:
To see the time and location of lectures and exercise classes please press the link/links under "Se skema" (See schedule) at the right side of this page (F means Spring).
You can find the similar information English at
-Select Department: “2200-Økonomisk Institut” (and wait for respond)
-Select Module:: “2200-F20; [Name of course]”
-Select Report Type: “List – Weekdays”
-Select Period: “Forår/Spring – Week 5-30”
Press: “ View Timetable”
- Class Instruction
The students will receive:
Written feedback on mandatory assignments.
Immediate feedback from quizzes on the content of the lectures.
For foreign students not enrolled: Admission requirements, registration etc: Study Economics.
For gæste- og enkelfagsstuderende: Tilmelding via Uddannelse i Økonomi.
- 7,5 ECTS
- Type of assessment
- Written assignment, 24 hoursindividuel take-home assignment. The students are allowed to communicate about the given problem-set but must work on, write and upload the assignment answer individually. Be aware that the plagiarism rules must be complied. The exam assignment is given in English and must be answered in English.
- Exam registration requirements
During the semester mandatory assignments must be handed in to the teachingassistants not later than the given deadlines.
Two mandatory assignments must be approved to be able to sit the exam.
- All aids allowed
- Marking scale
- 7-point grading scale
- Censorship form
- No external censorship
for the written exam. The exam may be chosen for external censorship by random check.
- Exam period
The exam takes place
From 23 May at 10.00 AM to 24 May 2020 at 10.00 AM
In special cases, the exam date can be changed to another day and time within the exam period.
Further information about the exam will be available in Digital Exam from the middle of the semester.
From 29 August at 10.00 AM to 30 August 2020 at 10.00 AM
NOTE: If only few students register for the written re-exam, the re-exam might change to a 20 minutes oral examination without preparation. Aids are allowed at the examination. If changed to an oral re-exam, the exam date, time and place might change as well. The Examination's Office then inform the students by KU e-mail.
Info is available in Digital Exam early August.
Criteria for exam assesment
Students are assessed on the extent to which they master the learning outcome for the course.
To receive the top grade, the student must with no or only a few minor weaknesses be able to demonstrate an excellent performance displaying a high level of command of all aspects of the relevant material and can make use of the knowledge, skills and competencies listed in the learning outcomes.