AØKK08371U  Topics in Social Data Science

Volume 2018/2019
Education

MSc programme in Economics – elective course.

Bacheloruddannelsen i økonomi – valgfag på 3. år

The Danish BSc programme in Economics - elective at the 3rd year

 

The PhD Programme in Economics at the Department of Economics  - elective course with resarch module (PhD students must contact the study administration and the lecturer in order to write the research assignment)

 

 

Content

The objective of this course is to teach students how to leverage the data science toolbox for use in social science. We emphasize the use of new data sources associated with communication, behavior, transactions, etc., which are increasingly available through the web and by collection from the various devices we use. These new sources of structured and unstructured data allow for testing and validation of existing theories in social science as well as development of new ones. Performing these analyses, however, requires an ability to understand and apply computational methods.

 

In this course, we build on the introductory course in social data science (AØKK08216U). We introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques. We do this by practical examples and provide students with hands-on experience. We also build on the machine learning techniques from the introductory course. We introduce advanced techniques including ensemble learning and deep learning. We discuss how social science leverage these tools and the increasing role they might play.

 

The first canonical data structure we introduce is networks and relational data. Networks are essential for representing systems of interaction such as information transmission, social behavior as well as for risk in the interbank markets. We then introduce spatial data, for representing locations, shapes, and boundaries within the built environment, as well as mobility traces. These data increasingly play a role in sociology, economics and political science. Finally, we cover text as data, which is unarguably the most abundantly and readily available data source in the form of news articles, speeches, forum threads, social media posts, encyclopedia, etc.

Learning Outcome

After completing the course, the student should be able to:

Knowledge:

  • Account for the structure of complex networks and understand modeling of social relations based on network statistics like node degree and centrality measures.

  • Understand fundamental concepts in machine learning: model generalization, overfitting, loss functions, the bias variance trade-off and cross-validation.

  • Account for various learning strategies, algorithms as well as approaches: clustering and unsupervised learning, supervised learning, semi-supervised learning, transfer learning, multi-task learning.

  • Know spatial data structures and shapes including points, lines and polygons and account for the choice of coordinate system.

  • Understand the potential of different representations of text: structured and unstructured, graph-based, and latent representations.

  • Comprehend how network, spatial and text data as well as machine learning can be applied in the social sciences.

Skills:

  • Apply fundamental machine learning tools, including model selection, hyperparameter search and robust model validation. We specifically require an ability to estimate work with ensemble learning and artificial neural networks.

  • Extract reliable information from text data using supervised learning and techniques from natural language processing.

  • Structure spatial data for analysis by manipulating shapes, compute local network statistics and spatially combining various sources.

  • Compute network measures including centrality, clustering, sorting as well as contagions effects.

Competencies:

  • Integrate theoretical and applied knowledge within the field of Data Science and formulate powerful research questions given an interesting dataset.

  • Communicate results using comprehensive statistics and modern visualization methods in particular plotting new data types.

  • Critically evaluate the implications of results, taking into account model limitations and biases, and systematic noise introduced by data collection and sampling methods. 

  • Barabási, Albert-László. Network science. Web book avaialable free at http:/​/​barabasi.com/​networksciencebook/​. Cambridge university press, 2016.
  • Gimond, Manuel. Intro to GIS and Spatial Analysis. Web book available free at https:/​/​mgimond.github.io/​Spatial/​index.html. Preprint, 2017.
  • Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
  • Bender, Emily M. "Linguistic fundamentals for natural language processing: 100 essentials from morphology and syntax." Synthesis Lectures on Human Language Technologies 6.3 (2013)
  • Farzindar, Atefeh, and Diana Inkpen. "Natural language processing for social media." Synthesis Lectures on Human Language Technologies 8.2 (2015)
  • Søgaard, Anders. "Semi-supervised learning and domain adaptation in natural language processing." Synthesis Lectures on Human Language Technologies 6.2 (2013)
  • Friedman, J., Hastie T., and R. Tibshirani. Elements of statistical learning. Second edition, 12th printing. Web book available free at https:/​/​web.stanford.edu/​~hastie/​ElemStatLearn/​. Springer, 2017.
  • Raschka, Sebastian, and Vahid Mirjalili. Python Machine Learning, 2nd Ed. Packt Publishing, 2017.
  • Athey, Susan and Guido Imbens. Recursive partitioning for heterogeneous causal effects. Proceedings in National Academy of Sciences, 2017.
It is strongly recommended to have followed the course Social Data Science or or a similar data science course, e.g. R for Data Analysis or Political Data Science. Linear algebra is also strongly recommended.

All students are expected to have strong skills in Python for data science as we begin the course. These skills should include data structuring (pandas, numpy), visualization skills (matplotlib, Seaborn), collecting data by scraping (requests, regular expressions, BeautifulSoup/Scrapy) and finally machine learning fundamental (sklearn). Note if you come from an R background you should have no problem making the transition but preparation is required!
Lectures. Main work will be exercise individually and in groups which will focus on applying methods.
Schedule:
2 hours lectures once a week from week 6 to 20 (except holidays)
2 hours exercise classes once a week from week 6/7 to 20/21 (except holidays)

Timetable and venue:
To see the time and location of lecturesplease press the link/links under "Se skema" (See schedule) at the right side of this page (F means Spring).

You can find the similar information English at
https:/​/​skema.ku.dk/​ku1819/​uk/​module.htm
-Select Department: “2200-Økonomisk Institut” (and wait for respond)
-Select Module:: “2200-F19; [Name of course]”
-Select Report Type: “List – Weekdays”
-Select Period: “Forår/Spring – Week 5-30”
Press: “ View Timetable”

The overall schema for the BA 3rd year and Master courses can be seen at KUnet:
MSc in Economics => "courses and teaching" => "Planning and overview" => "Your timetable"
BA i Økonomi/KA i Økonomi => "Kurser og undervisning" => "Planlægning og overblik" => "Dit skema"
Credit
7,5 ECTS
Type of assessment
Written assignment, 3 weeks
project exam. It is allowed to work in groups of 3 to 4 participants. The plagiarism rules must be complied and please be aware of the rules for co-writing assignments.
The project paper must be written in English.
____
Exam registration requirements

To be eligible for the exam the project description and two mandatory assignments must be approved.

____

Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
The exam can be selected for external assessment.
____
Exam period

Exam information:

The project description must be handed in no later than:

26 April 2019 at 10 AM in Absalon

 

The project must be handed in no later than:

24 May 2019 at 10 AM in Digital Exam

 

Note: In special cases, the dates may be changed, which will be informed.

 

Further information about the exam will be available in Digital Exam from the middle of the semester.

Information about examination, rules, exam Schedule etc.: Master(UK) , Master(DK) and Bachelor(DK).

_

Re-exam

Reexam information:

The project description must be handed in no later than:

5 August 2019 at 10 AM to samf-fak@samf.ku.dk

 

The project must be handed in no later than:

23 August 2019 at 10 AM in Digital Exam

 

Note: In special cases, the written reexam can change to another day within the reexam period. Or to an oral exam incl. date, time and place, if only a few students are registered. This will be informed by the Exam Office.

Information: Master(UK) Master(DK) and Bachelor (DK).

Criteria for exam assesment

Students are assessed on the extent to which they master the learning outcome for the course.

To receive the top grade, the student must with no or only a few minor weaknesses be able to demonstrate an excellent performance displaying a high level of command of all aspects of the relevant material and can make use of the knowledge, skills and competencies listed in the learning outcomes.

 

In particular, the student should in this course be able to independently analyze new data sets using the tools and theories covered in the course. This includes construction of VAR model for the data and a discussion and testing of the underlying assumptions. Determination of the cointegration properties. Formulation and test of relevant hypotheses on the cointegrating relations and the short-term adjustment. Be able to analyze models for data integrated of order two.

  • Category
  • Hours
  • Lectures
  • 42
  • Preparation
  • 112
  • Class Instruction
  • 28
  • Exam
  • 24
  • Total
  • 206