Københavns Universitet - Kurser

NDAB18000U Data Science

Volume 2018/2019

BSc Programme in Computer Science

This course covers the components that go into a full data science pipeline, from the collection, processing and cleaning of data, to storing it efficiently in a database, to the implementation of efficient and modular models, to the exploration of data through interactive visualizations. Emphasis will be placed on dealing with data from multiple sources, and on the design of a modular work flow. Finally, the course will touch upon some of the fundamental challenges in data science, such as the presence of bias, and its potential impact on decision-making.

Learning Outcome

Knowledge about:

Reading of structured text:
- Regular expressions and finite automata
- Grammars and parsing
Databases:
- Central database concepts such as the relational model, data independence, and transactions
- Entity-relation-modelling (ER modelling) and relational data modelling, including transformations from ER modelling to ER-relational data modelling
- Queries in database query-languages, including relational algebra and SQL
- Theory on database normalization, including functional dependencies, keys, and relational decomposition
- ACID (atomicity, consistency, isolation, durability) properties and use of transactions

Fundamentals of data integration:
- Strategies for dealing with data heterogeneity
- Data cleaning, error handling, and missing data
- Unstructured to structured data
Model design and implementation:
- Basic modelling concepts
- Structured model design
- Model testing and deployment
Data exploration and visualization:
- Key principles of visualization
- Dimensionality reduction techniques
- Fundamental visualization and interaction techniques for different data types
- Techniques for building and deploying visualizations on the web

Skills:

The student is familiar with writing scripts for data collection and preprocessing.
The student can use a parser generator to read structured text.
The student can set up database systems supporting heterogeneous sources of data.
The student has gained experience with designing a modular pipeline for the analysis of a concrete problem.
The student can create interactive visualization on the web.

Competences:

The student understands the key challenges in designing an effective data science workflow supporting multiple data sources and multiple types of analysis. In particular, the student:

has a detailed knowledge of database fundamentals, including the SQL query language
is familiar with core concepts in data integration
has a clear understanding of modular software design for data science applications
has detailed knowledge of the fundamentals of data visualization
can produce cross-platform, shareable, interactive visualizations for the web

Recommended Academic Qualifications

The student should have basic knowledge of programming, algorithms linear algebra, calculus and statistics, as obtainable through the courses PoP, DMA, LinAlgDat, MASD and MAD.

Teaching and learning methods

Lectures, exercise classes and project.

Workload

Category
Hours
Exam
2
Lectures
72
Preparation
166
Project work
100
Theory exercises
72
Total
412

Feedback form

Written

Individual

Collective

Continuous feedback during the course of the semester

Peer feedback (Students give each other feedback)

Exam

Credit

15 ECTS

Type of assessment

Written assignment

Written examination, 2 hours under invigilation

The exam consists of two parts: a project developed during course in groups generating a report with individual contributions (60%) and a final written two-hour test (40%).

Aid

For the final written exam only written aids are allowed.

Marking scale

7-point grading scale

Censorship form

No external censorship

Several internal examiners

Re-exam

As the ordinary exam.

(Re)submission of (individual) project no later than 2 weeks prior to the reexam and a new written two-hour test under invigilation.

If fewer than 10 students participate in the reexam, the written test is replaced by an oral examination with a duration of 30 minutes without preparation.

Criteria for exam assesment

See learning goals.

Course information

Language: English
Course code: NDAB18000U
Credit: 15 ECTS
Level: Bachelor
Duration: 2 blocks
Placement: Block 3 And Block 4
Schedule: A
Course capacity: No restrictions/no limitation
Continuing and further education
Study board: Study Board of Mathematics and Computer Science

Contracting department

Department of Computer Science

Contracting faculty

Faculty of Science

Course Coordinators

Marcos Antonio Vaz Salles (vmarcos@di.ku.dk)
Wouter Krogh Boomsma (wb@di.ku.dk)

Saved on the 20-11-2018

Tilbage