CSU2016122 Big Data Analysis - tools and methods
Se kursusbeskrivelsen på kampagnesiden for Copenhagen Summer University
This course will bring you in the forefront of the newest tools and methods based on cutting edge research and experience. Big Data is omnipresent from industries to government and is frequently considered a completely new approach to problem solving. While the possibilities are often exaggerated, Big Data does indeed introduce new opportunities and challenges. The ability to analyze and combine large data from different sources has obvious applications, nonetheless, the lack of quality in the data combined with a high entropy means that conventional analysis often fails.
What you will learn
By completing the course you will be able to set up basic Big Data
Analysis end-to-end; from retrieving and cleaning the data, to
establishing the information level and extracting patterns and
finding outliers and to curate the necessary data. Furthermore you
will get acquainted with a number of advanced tools.
Course Content
We will use two datasets consistently throughout the course, one
using structured data and one using unstructured data. The data
will be used to demonstrate the different steps in Big Data
Analysis.
The course contains the following methods and tools:
Core elements:
- Data cleaning. Detecting and correcting (or removing) corrupt or inaccurate records.
- Statistics methods for very large datasets. Robust methods for very large datasets and data with very large variance.
- Finding patterns and outliers in Big Data. Which methods can be used to identify sparse patterns in very large datasets, and how to identify data that does not follow the overall pattern for a dataset.
- Deep Learning. Machine learning methods especially focused on patterns and classification in image based datasets.
- Systems for Big Data Analysis. Common systems for BDA; Hadoop, PyDisco etc, and hardware systems design for efficient BDA.
Other tools/methods (emphasized depending on participants interest):
- Selected machine learning algorithms for large-scale data.
- Random forests.
- Large-scale exact nearest neighbor search.
- Data curation. How to select data for long time curation, systems, techniques and standards for data curation.
- Search Engines and Recommender Systems. The state-of-the-art in ranking models, used by search engines and recommender systems worldwide, is based on probabilistic Language Models. We will cover the basic principles of the Models and provide a tutorial on how to use them on the Indri award-winning information retrieval platform.
We will be working with several programming tools, however all techniques that are covered are easily implemented with all standard data-analysis languages; R, Python. Matlab, etc.
Participants
The course is aimed at people who are already acquainted with data
analyses. The course is strictly focused on Big Data Analysis, thus
a background in statistics and/or conventional data analysis is
assumed. Participants must hold at least a relevant Bachelor level
and/or several years of data analysis experience.
Course dates
5 days, 22 – 26 August 2016, 9:00 – 16:30 at the University of
Copenhagen, Frederiksberg Campus.
Course directors
Troels C. Petersen, Associate Professor, Niels
Bohr Institute, University of Copenhagen
Christian Igel, Professor, Dr. habil. Department
of Computer Science, University of Copenhagen
Other course teachers
Christina Lioma, Associate Professor, The Image
Section, Department of Computer Science, University of Copenhagen
Joachim Mathiesen, Associate Professor,
Biocomplexity, Niels Bohr Institute, University of Copenhagen
Brian Vinter, Professor, eScience, Niels Bohr
Institute, University of Copenhagen
Course fee
EUR 2,600/DKK 19,000 excl. Danish VAT. Fee includes teaching,
course materials and all meals during the course.
See "What will you learn"
- Category
- Hours
- Class Instruction
- 35
- Preparation
- 10
- Total
- 45
- Credit
- 0 ECTS
- Type of assessment
- Course participationNone
- Marking scale
- Without assessment
- Censorship form
- No external censorship
Criteria for exam assesment
None
Course information
- Language
- English
- Course code
- CSU2016122
- Credit
- 0 ECTS
- Level
- Part Time Master
- Duration
- 5 days
- Placement
- Summer
22 – 26 August 2016
- Schedule
- 9:00 – 16:30
- Course capacity
- 24 participants
- Continuing and further education
- Price
EUR 2,600/DKK 19,000 (excl. Danish VAT 25%).
- Study board
- Income-generating activities
Contracting departments
- The Niels Bohr Institute
- Department of Computer Science
Course responsibles
- Troels Christian Petersen (8-796e7d6e7b7c6e7749776b7237747e376d74)
- Christian Igel (4-6d6b697044686d326f7932686f)