NIFK16009U Managing Big and Spatial Data in Social Science

Volume 2015/2016
Content

The amount of data publically available from online sources is increasing dramatically. It is crucial that aspiring researchers and natural resource managers know how to handle these large quantities of data and how to extract information from them. Recognizing these challenges and the inadequacy of spreadsheet approaches to data analysis this course aspires to equip students to the challenge ahead.

 

The course aims to provide students insight into appropriate data management procedures and critical analysis of empirical quantitative socioeconomic and spatial data such as required in relation to conducting an MSc thesis or doing research. Students will be introduced to concepts, terminology and methods related to handling of time series, cross-sectional and panel data as well as geo-referenced spatial information originating from individual, household and institutional level quantitative socioeconomic data. This includes using log files for writing and debugging code for data handling procedures in relation to merging datasets, overlaying GIS map layers, assessing the quality of the data, data cleaning and coding of different types of variables (outlining the coding principles), identifying and handling outliers, selecting appropriate transformation methods and generating new variables intended for testing specific hypotheses. Students will then be introduced to concepts, terminology and approaches in relation to basic statistical data analysis. This includes developing a data analysis strategy, developing testable hypothesis based on research questions and selecting basic statistical methods to test specific research hypothesis about intergroup and spatial patterns in the data. Examples that will be used are tabulating basic statistics, specification of linear regression models including interactions and interpreting and visualizing model results. Throughout the course, focus will be on making the data handling process transparent and reflecting on the implications of data management choices and choice of statistical approach in relation to validity and reliability of the results of the analysis and good scientific practice. 

 

The course will develop students’ skills to conduct own data management and analysis through a series of lectures, hands-on group exercises and student presentations of assignments based on provided empirical research datasets. The last part will be independent (supervised) project work. The course will take departure in the free statistical software package R and the geographical information software Q-GIS.     

Learning Outcome

The aim of this course is to provide the tools and experience with managing and analysing data with a focus on socioeconomic and spatial data as would be required in relation to research and conducting a MSc thesis project with quantitative data in natural and social sciences and beyond.

 

Knowledge:

Describe different types of datasets and variables (incl. the nature of maps and geodata) and the implications in relation to choice of data management procedure and analysis strategy

Explain principles of good conduct in relation to data storage, documentation and anonymization of person sensitive data

Show overview of principles and procedures for importing, merging, coding, transforming and otherwise preparing data for statistical analysis in R and Q-GIS

Describe the arguments for using log-files and developing an analysis strategy in relation to good scientific practice. 

Present an overview of basic approaches to quantitative data analysis

Explain the concepts of predictions and residuals

Describe the underlying assumptions and conditions for valid application of relevant statistical tests

 

Skill:

Apply procedures for managing different types of data in R and Q-GIS in preparation for statistical analysis 

Combine different data sets and produce composite maps from multiple sets of digital spatial data

Develop research questions and hypothesis and select approaches to test the hypothesis appropriately using basic statistical methods

Implement statistical analysis in R and Q-GIS to derive basic cross-sectional and spatial metrics and estimate linear regression models

Solve coding problems in data management and basic statistical analysis in R and Q-GIS.

Interpret, visualize and present statistical results in a clear and concise manner

 

Competencies:

Formulate relevant research questions and hypothesis to address analytical research problems in relation to empirical datasets in the context of natural resource management

Argue convincingly for appropriate choice of data management procedure and statistical methods suitable to test basic research questions and hypothesis in relation to specific research problems and based on available data

Discuss the results of empirical data analysis in terms of relevance, reliability, validity and interpretation

Reflect critically on the implications of data quality, data handling procedures, statistical methods and test and model assumptions and limitations in relation to conclusions drawn from the analysis

Examples of relevant literature:

Bivand R.S. Pebesma E. Gómez-Rubio V. 2013 Applied Spatial Data Analysis with R UseR! Series, Springer. 

Burns P.:2011, The R Inferno.  http:/​/​www.burns-stat.com/​documents/​books/​the-r-inferno/​#!prettyPhoto 

Dell M. 2009 GIS Analysis for Applied Economists. Department of Economics, Massachusetts Institute of Technology.  http:/​/​scholar.harvard.edu/​files/​dell/​files/​090110combined_gis_notes.pdf

Fischer, Manfred M., Getis, Arthur (Eds.) (2010) Handbook of Applied Spatial Analysis Software Tools, Methods and Applications. 

Paradis, E.: 2005, R for beginners.  http:/​/​cran.r-project.org/​doc/​contrib/​Paradis-rdebuts_en.pdf

Ricci V.: 2005 - R Functions For Regression Analysis.  http:/​/​cran.r-project.org/​doc/​contrib/​Ricci-refcard-regression.pdf

Thiede R., Sutton T., Düster H., Sutton M.: 2014,  Quantum GIS Training Manual Release 1.0. http:/​/​manual.linfiniti.com/​LinfinitiQGISTrainingManual-en.pdf

Abedin, J., & Das, K. K. (2015). Data Manipulation with R. Packt Publishing Ltd. http:/​/​www.allitebooks.com/​data-manipulation-with-r-second-edition/​

Van den Broeck, J., Argeseanu Cunningham, S., Eeckels, R., & Herbst, K. (2005). Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities. PLoS Medicine, 2(10), e267. http:/​/​doi.org/​10.1371/​journal.pmed.0020267

Wickham, H. (2014). Tidy data. Under review. http:/​/​vita.had.co.nz/​papers/​tidy-data.pdf

Osborne, J. W. (2012). Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data. Sage.

 

The precise pensum litterature list will be present on the course homepage in Absalon. 

None required but basic statistics course recommended
Practical and theoretical considerations in relation to procedures and methods are presented in lectures supported by relevant examples. Learning outcomes are achieved through illustrative exercises for individual work and group work presented and discussed in plenum. Lecture examples and exercises will be based on small data sets from case studies as well as larger surveys focusing on natural resource management problems examined from a natural and social science perspective. Students will obtain experience and practice in evaluating empirical evidence and putting results into perspective and discussing their implications in relation to published interpretations and conclusions from these surveys. During the exercises the students will accumulate a command library for the relevant tasks applicable to a similar data management and analysis project.
  • Category
  • Hours
  • Lectures
  • 30
  • Practical exercises
  • 40
  • Preparation
  • 40
  • Project work
  • 96
  • Total
  • 206
Credit
7,5 ECTS
Type of assessment
Oral examination, 15 minutes
Students will be assessed individually based on a short oral presentation in plenum of test of own developed research hypothesis, log-file with data management procedures, and output of analysis such as tables, figures and models based on the report of the written assignment.
Exam registration requirements

The exam is conditional on handing in of a written group assignment Wednesday the third week of the course based on a given dataset. Students will receive feedback on the report Friday the last day of the course.

Aid
All aids allowed
Marking scale
passed/not passed
Censorship form
No external censorship
two or more internal examiners
Exam period

The exam is scheduled Friday in week 35 - the last day of the course.  

Re-exam

As the ordinary exam. 

If the student has not handed in a written assignement then it must be handed in two weeks prior to the deadline of registration for the re-exam. It must be approved before the exam.

Criteria for exam assesment

To pass the course the student must convincingly fullfil the Learning Outcome described above.