NIFK17003U Managing and Analysing Cross Sectional and Spatial Data in Social Science
The amount of data publically available from online sources is increasing dramatically. It is crucial that aspiring researchers and natural resource managers know how to handle these large quantities of data and how to extract information from them. Recognizing these challenges and the inadequacy of spreadsheet approaches to data analysis this course aspires to equip students to the challenge ahead.
The course aims to provide students insight into procedures for appropriate data management and critical analysis of empirically derived quantitative socioeconomic and spatial data as would be required to conduct a MSc thesis or do research.
Students will be introduced to concepts, terminology and methods relevant to handling cross-sectional data (and time series and panel data if time permits) and geo-referenced spatial information originating from individual, household and institutional level quantitative socioeconomic surveys. This includes using scripts for writing and debugging code for data management procedures in relation to merging datasets, overlaying GIS map layers, assessing data quality, data cleaning and coding of different types of variables (outlining the coding principles), identifying and handling outliers, selecting appropriate transformation methods and generating new variables intended for testing specific hypotheses.
Students will also be introduced to basic statistical data analysis procedures. This includes developing testable hypothesis based on research questions, developing a data analysis strategy, and selecting appropriate statistical methods to test specific hypothesis such as about intergroup and spatial patterns in the data. Examples that will be used are tabulating basic statistical measures, specification of linear regression models including interactions and interpreting and visualizing model results. Throughout the course, focus will be on making the data handling process transparent and reflecting on the implications of data management choices and choice of statistical approach in relation to validity and reliability of the results of the analysis and good scientific practice.
The course aims to develop students’ skills to conduct own data management and analysis through a series of lectures, hands-on group exercises and student presentations of assignments based on provided empirical research datasets. The last week of the course will be independent (supervised) group project work.
The course uses the free statistical software package R and the geographical information software Q-GIS.
The aim of this course is to provide participants with the tools and experience in managing and analysing data, with a focus on socioeconomic and spatial data, that would be required to conduct a MSc thesis project or do research based on quantitative data in natural and social sciences and beyond.
Describe different types of datasets and variables (incl. the nature of maps and geodata) and the implications for choice of appropriate data management procedure and analysis strategy
Explain principles of good conduct in relation to data storage, documentation and anonymization of person sensitive data
Show overview of principles and procedures for importing, merging, coding, transforming and otherwise preparing data for statistical analysis in R and Q-GIS
Describe the arguments for using log-files and developing an analysis strategy in relation to good scientific practice
Present an overview of basic approaches to quantitative data analysis
Explain the concepts of predictions and residuals
Describe the underlying assumptions and conditions for valid application of relevant statistical tests
Apply procedures for managing different types of data in R and Q-GIS in preparation for statistical analysis
Combine different data sets and produce composite maps from multiple sets of digital spatial data
Develop research questions and hypothesis and select appropriate approaches to test the hypothesis using basic statistical methods
Implement statistical analysis in R and Q-GIS to derive basic cross-sectional and spatial metrics and estimate linear regression models
Solve coding problems in data management and basic statistical analysis in R and Q-GIS.
Interpret, visualize and present statistical results in a clear and concise manner
Formulate relevant research questions and hypothesis to address analytical research problems in relation to empirical datasets in the context of natural and social science
Argue convincingly for appropriate choice of data management procedure and statistical methods suitable to answer basic research questions and test hypothesis based on available data and specific empirical problems
Discuss the results of empirical data analysis in terms of relevance, reliability, validity and interpretation
Reflect critically on the implications of data quality, data handling procedures, statistical methods and tests and model assumptions and limitations in relation to conclusions drawn from the analysis
Examples of relevant literature:
Paradis, E.: 2005, R for beginners. http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Ricci V.: 2005 - R Functions For Regression Analysis. http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
Thiede R., Sutton T., Düster H., Sutton M.: 2014, Quantum GIS Training Manual Release 1.0. http://manual.linfiniti.com/LinfinitiQGISTrainingManual-en.pdf
Abedin, J., & Das, K. K. (2015). Data Manipulation with R. Packt Publishing Ltd. http://www.allitebooks.com/data-manipulation-with-r-second-edition/
Osborne, J. W. (2012). Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data. Sage.
The precise litterature list will be present on the course homepage in Absalon.
If you are not a student at The Faculty of Science at The University of Copenhagen click here
- 7,5 ECTS
- Type of assessment
- Oral examination, 15 minutesStudents will be assessed individually based on a short oral presentation in plenum of test of own developed research hypothesis, log-file with data management procedures, and output of analysis such as tables, figures and models based on the report of the written assignment.
- Exam registration requirements
The exam is conditional on handing in of a written group assignment Wednesday the third week of the course based on a given dataset. Students will receive feedback on the report Friday the last day of the course.
- All aids allowed
- Marking scale
- passed/not passed
- Censorship form
- No external censorship
two or more internal examiners
As the ordinary exam.
If the student has not handed in a written assignement then it must be handed in two weeks prior to the deadline of registration for the re-exam. It must be approved before the exam.
Criteria for exam assesment
To pass the course the student must convincingly fullfil the Learning Outcome described above.
- Practical exercises
- Project work