NDAB18000U Data Science (DS)

Volume 2020/2021
Education

BSc Programme in Computer Science
BSc Programme in Machine Learning and Data Science

Content

This course covers the components that go into a full data science pipeline, such as the collection, processing and cleaning of data, storing it efficiently in a database, the implementation of efficient and modular models, and the exploration of data through visualisations.

Emphasis will be placed on dealing with data from multiple sources, and on the design of a modular workflow.

Finally, the course will touch upon some of the fundamental challenges in data science, such as the presence of bias, and its potential impact on decision-making.

 

Learning Outcome

Knowledge of

  • Reading of structured text:

     - Regular expressions and finite automata

     - Grammars and parsing

  • Databases:

     - Central database concepts such as the relational model, data independence, and transactions

     - Entity-relation-modelling (ER modelling) and relational data modelling, including transformations from ER modelling to ER-relational data modelling

     - Queries in database query-languages, including relational algebra and SQL

     - Theory on database normalisation, including functional dependencies, keys, and relational decomposition

     - ACID (atomicity, consistency, isolation, durability) properties and use of transactions

  • Fundamentals of data integration:

     - Strategies for dealing with data heterogeneity

     - Data cleaning, error handling, and missing data

     - Unstructured to structured data

  • Model design and implementation:

     - Basic modelling concepts

     - Structured model design

     - Model testing and deployment

  • Data exploration and visualisation:

     - Key principles of visualisation

     - Dimensionality reduction techniques

     - Fundamental visualisation and interaction techniques for different data types

     - Techniques for building and deploying visualisations on the web

Skills in

  • Writing scripts for data collection and preprocessing.
  • Using a parser generator to read structured text.
  • Setting up database systems supporting heterogeneous sources of data.
  • Designing a modular pipeline for the analysis of a concrete problem.
  • Creating visualisation on the web.

 

Competences in

The student understands the key challenges in designing an effective data science workflow supporting multiple data sources and multiple types of analysis. In particular, the student:

  • can use SQL queries to make meaningful queries in databases
  • can solve basic data integration tasks
  • is able to design and understand modular data science pipelines
  • can produce cross-platform, shareable, visualisations for the web
  • can clearly and precisely document data analysis workflows, methodology and results
The student should have basic knowledge of programming, algorithms linear algebra, calculus and statistics, as obtainable through the courses:
PoP
MASD and MAD, or MatIntro and SS
DMA or DMFS (DMFS can be followed simultaneously in block 3)
LinAlgDat (LinAlgDat can be followed simultaneously in block 4)
Lectures, exercise classes and project.
  • Category
  • Hours
  • Lectures
  • 72
  • Preparation
  • 166
  • Theory exercises
  • 72
  • Project work
  • 100
  • Exam
  • 2
  • Total
  • 412
Written
Collective
Continuous feedback during the course of the semester

Written feedback is provided as comments to assignment solutions.

Continuous feedback is provided during exercise classes, where students can engage in Q&A with teaching assistants.

Credit
15 ECTS
Type of assessment
Written assignment
Written assignment, 24 hour take home exam
The exam consists of two parts:

1. A group project developed during the course and documented with a report wherein the individual contributions are stated (60%)

2. A final written individual 24-hours take home exam(40%)

The project weigh 60 pct. of the grade and the written 24-hour take home exam weigh 40 pct. It is needed that both part of the exam is passed with minimum 00 and the weighted average should minimum be 02 to pass the course.
Exam registration requirements

1-3 mandatory assignments, marked as passed/failed, must be passed to be qualified for the exam.

Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
Several internal examiners
Re-exam

1) (Re)submission of (individual) project no later than 2 weeks prior to the reexam

2) An oral examination ( 30 minutes without preparation) in curriculum 
 

If a student is not qualified for the exam then qualification can be achieved by submitting and getting the equivalent assignments approved no later than 2 weeks before the re-exam.

The project weigh 60 pct. of the grade and the oral examination weigh 40 pct. It is needed that both part of the exam is passed with minimum 00 and the weighted average should minimum be 02 to pass the course.

Criteria for exam assesment

See learning goals.