AØKK08400U Social Data Science: Econometrics and Machine Learning

Volume 2019/2020
Education

MSc programme in Economics – elective course.

 

The PhD Programme in Economics at the Department of Economics  - elective course with research module (PhD students must contact the study administration and the lecturer in order to write the research assignment)

 

NOTE: Due to an overlapping syllabus this course cannot be taken if the course "Topics in Social Data Science" (AØKK08371U) has been taken.

 

Content

"Social Data Science: Econometrics and Machine Learning" is one of two new courses in Social Data Science, that build on the introductory summer school course in social data science. The courses introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques by practical examples and hands-on experience. In each course, we discuss how novel social data science applications apply these tools.

 

"Social Data Science: Econometrics and Machine Learning" focuses on methods for analyzing on tree based model and causal models as well as social networks.

 

The course has a dual focus on both methods and data structures. In terms of methods we investigate the interesection of microeconometrics and machine learning. From a data perspective we cover relational data which may cover complex and social networks and Geographic Information System (GIS). The methods includes some very recent developments in econometrics, in terms of hybrid statistical models that are built for econometrics but leverage machine learning as well as fundamental and novel models for estimation in networks.

 

The course begins by introducing spatial/GIS data and then introduces tree and kernel based models. The course then proceeds to cover machine learning models for causal inference. We next introduce networks and relational data as a canonical data type. Networks are essential for representing systems of interaction such as information transmission, social behavior as well as for risk in the interbank markets. Finally, we cover methods for estimating choice models and how they relate to estimating models for social spillovers and netw ork formation.

Learning Outcome

After completing the course, the student is expected to be able to:

 

Knowledge:

  • Define the structure of complex networks and account for modeling of social relations based on network statistics like node degree and centrality measures.
  • Discuss how bagging and boosting works in machine learning.
  • Account for how machine learning can improve statistical models
  • Reflect on how networks and  spatial data can be applied for new research in the social sciences.
  • Identify spatial data structures and shapes including points, lines and polygons and account for the choice of coordinate system.

 

Skills:

  • Apply advanced machine learning techniques for estimating errors of machine learning models
  • Compute network measures including centrality, clustering, sorting as well as contagions effects.
  • Structure spatial data for analysis by manipulating shapes, compute local network statistics and spatially combining varioussources.
  • Estimate models for peer effects and network formation.
  • Master the method of training tree based and kernel based machine learning models.

 

Competencies

  • Integrate theoretical and applied knowledge within the field of Social Data Science and formulate powerful research questions given an interesting dataset.
  • Communicate results using comprehensive statistics and modern visualization methods in particular plotting new data types.
  • Critically evaluate the implications of results, taking into account model limitations and biases, and systematic noise introduced by data collection and sampling methods.
  • Evaluate whether a research design using machine learning and/or networks, spatial data will lead to well identified effects.

The follow ing is a partial, tentative list of course readings.

 

Machine learning and statistics

  • Raschka, Sebastian, and Vahid Mirjalili. Python for Machine Learning, 2nd Ed. Packt Publishing, 2017.
  • Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11.
  • Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106.
  • Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), 7353-7360.
  • Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.
  • Wager, S., Hastie, T., & Efron, B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife.
  • The Journal of Machine Learning Research, 15(1), 1625-1651.
  • Belloni, A., Chernozhukov, V., Fernández-Val, I., & Hansen, C. (2017). Program evaluation and causal inference with highdimensional data. - Econometrica, 85(1), 233-298.
  • Hartford, J., Lew is, G., Leyton-Brow n, K., & Taddy, M. (2017, August). Deep IV: A flexible approach for counterfactual prediction. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1414-1423). JMLR. org.
  • Belloni, A., Chernozhukov, V., & Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2), 29-50.
  • Chernozhukov, V., Hansen, C., & Spindler, M. (2015). Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review , 105(5), 486-90.

 

Networks

  • Barabási, Albert-László. Network science. Web book avaialable free here. Cambridge university press, 2016.
  • McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual review of sociology, 27(1), 415-444.
  • Rivera, M. T., Soderstrom, S. B., & Uzzi, B. (2010). Dynamics of dyads in social networks: Assortative, relational, and proximity mechanisms. Annual Review of Sociology, 36, 91-115.
  • Chandrasekhar, A. (2016). Econometrics of networkformation. The Oxford Handbook of the Economics of Networks. 303-357.
  • Manski, C.F., 1993. Identification of endogenous social effects: The reflection problem. The review of economic studies, 60(3), pp.531-542.
  • Sacerdote, B., 2001. Peer effects with random assignment: Results for Dartmouth roommates. The Quarterly journal of economics, 116(2), pp.681-704.
  • Sacerdote, B., 2011. Peer effects in education: How might they w ork, how big are they and how much do w e know thus far?. In Handbook of the Economics of Education (Vol. 3, pp. 249-277). Elsevier.
  • Carrell, S.E., Sacerdote, B.I. and West, J.E., 2013. From natural variation to optimal policy? The importance of endogenous peer group formation. Econometrica, 81(3), pp.855-882.

 

Spatial data

 

All students are expected to have a strong familiarity with the Python programming language.

It is necessary to have followed the summerschool course "Introduction to Social Data Science" at the Study of Economics, University of Copenhagen (or similar introduction to Python and machine learning). The specific skills needed are:
- ability to write code in core Python programming as well as the numpy and Pandas packages, including transforming, merging, aggregating;
- experience with training linear machine learning models, model validation and model selection.

It is necessary to have followed the course Econometrics I at the Bachelor of Economics, University of Copenhagen (or similar introduction to econometrics). Students that do not have taken Econometrics I at the Study of Economics at Copenhagen University must have taken a similar course introducing econometrics and the student must have skills in estimating and understanding linear regression for inference.

It is recommended to have followed the course "Applied Econometric Policy Evaluation" at the Bachelor of Economics, University of Copenhagen, or similar courses with introduction to causal inference in economics. The specific skills needed are: Knowledge of potential outcome, methods for matching and difference-in-difference.
Lectures and lab sessions with exercises.
Schedule:
3 hours lectures once a week from week 6 to 20 (except holidays)
2 hours exercise classes once a week from week 6/7 to 20/21 (except holidays)

The overall schema for the BA 3rd year and Master courses can be seen at KUnet:
MSc in Economics => "courses and teaching" => "Planning and overview" => "Your timetable"
BA i Økonomi/KA i Økonomi => "Kurser og undervisning" => "Planlægning og overblik" => "Dit skema"

Timetable and venue:
To see the time and location of lectures and exercise classes please press the link/links under "Se skema" (See schedule) at the right side of this page (F means Spring).

You can find the similar information English at
https:/​/​skema.ku.dk/​ku1920/​uk/​module.htm
-Select Department: “2200-Økonomisk Institut” (and wait for respond)
-Select Module:: “2200-F20; [Name of course]”
-Select Report Type: “List – Weekdays”
-Select Period: “Forår/Spring – Week 5-30”
Press: “ View Timetable”
  • Category
  • Hours
  • Class Instruction
  • 28
  • Exam
  • 24
  • Lectures
  • 42
  • Preparation
  • 112
  • Total
  • 206
Written
Oral
Individual
Collective

 

The students will receive:

Written feedback on mandatory assignments.
Immediate feedback from quizzes on the content of the lectures.

Credit
7,5 ECTS
Type of assessment
Written assignment, 24 hours
individuel take-home assignment. The students are allowed to communicate about the given problem-set but must work on, write and upload the assignment answer individually. Be aware that the plagiarism rules must be complied. The exam assignment is given in English and must be answered in English.
____
Exam registration requirements

During the semester mandatory assignments must be handed in to the teachingassistants not later than the given deadlines.

Two mandatory assignments must be approved to be able to sit the exam.

____

Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
for the written exam. The exam may be chosen for external censorship by random check.
____
Exam period

The exam takes place

From 19 June at 10.00 AM to 20 June 2020 at 10.00 AM

 

Exam information:

In special cases, the exam date can be changed to another day and time within the exam period.

Further information about the exam will be available in Digital Exam from the middle of the semester.

 

Read about examination, rules etc. at: Master(UK) and Master(DK).

_

Re-exam

Reexam information:

From 28 August at 10.00 AM to 29 August 2020 at 10.00 AM

 

NOTE: If only few students register for the written re-exam, the re-exam might change to a 20 minutes oral examination without preparation. Aids are allowed at the examination. If changed to an oral re-exam, the exam date, time and place might change as well. The Examination's Office then inform the students by KU e-mail.

 

Reexam info:

Info is available in Digital Exam early August.

More info at Master(UK), Master(DK) and Bachelor(DK).

Criteria for exam assesment

Students are assessed on the extent to which they master the learning outcome for the course.

 

To receive the top grade, the student must with no or only a few minor weaknesses be able to demonstrate an excellent performance displaying a high level of command of all aspects of the relevant material and can make use of the knowledge, skills and competencies listed in the learning outcomes.