NDAK18000U Natural Language Processing (NLP)

Volume 2018/2019
Content

Have you ever wondered how to build a system that can process, understand or generate text automatically? For instance, to translate between languages, answer questions, or recognise the names of people in text? Then this course is for you.

 

This course will introduce the fundamentals of natural language processing (NLP), i.e., computational models of language and their applications to text. Language is at the heart of human intelligence, giving NLP a central role in Artificial Intelligence research and development.

We will combine machine learning (ML), including fundamental formalisms and algorithms, with a strong hands-on experience, i.e., the practical implementation of the methods for concrete NLP problems.

 

The course covers the following tentative topic list:

• NLP tasks: language modelling, text classification, semantics, information extraction, parsing, pragmatics, machine translation, summarisation, question answering

• methods: text classification, structured prediction, representation and deep learning, conditional random fields, beam search

• implementations: relationship between NLP tasks, efficient implementations

 

Throughout the course we will also discuss the themes of discriminative and generative learning, and different ways of obtaining supervision for training statistical NLP models.

Learning Outcome

Knowledge of

  • core NLP tasks (e.g. machine translation, question answering, information extraction)

  • methods (e.g. classification, structured prediction, representation learning)

  • implementations (e.g. relationship between NLP tasks, efficient implementations)

 

Skills to

  • identify the different kinds of NLP tasks

  • choose the correct algorithm for a given problem situation

  • implement core algorithms in Python

  • assess the most appropriate algorithms to solve a given NLP problem

  • distinguish and evaluate the advantages of different approaches to the same task

 

Competences to

  • decompose natural language tasks into manageable components

  • evaluate systems quantitatively and qualitatively

  • apply the learned skills in a wider context to areas that face similar challenges, for example data science or political science research, or gene sequencing

Literature

Selected papers and book chapters. See Absalon when the course is set up.

Familiarity with machine learning (probability theory, linear algebra, classification) and programming (Python) is required, either through formal education or self-study. No prior knowledge of natural language processing or linguistics is required.

Relevant machine learning competencies can be obtained through one of the following courses:
- NDAK15007U Machine Learning (ML)
- NDAK16003U Introduction to Data Science (IDS)
- Machine Learning, Coursera
The format of the class consists of lectures (including guest lectures), exercises, and project work.
This course will teach the fundamentals of natural language processing, in terms of methods, typical tasks and implementations. For those students with a specific interest in opinion and data mining, the course NDAK14004U Web Science (WS) is recommended. There will be no significant overlap between the two courses, and students are welcome to attend both of them.
  • Category
  • Hours
  • Lectures
  • 28
  • Practical exercises
  • 57
  • Preparation
  • 14
  • Project work
  • 50
  • Theory exercises
  • 57
  • Total
  • 206
Written
Oral
Credit
7,5 ECTS
Type of assessment
Continuous assessment
Students will be evaluated as follows:

- 3-5 assignments throughout the course (30%), to be completed individually
- short weekly quizzes (20%), at least 5 to be submitted
- final project, can be completed in a group of up to 3 students (50%)
Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
Several internal examiners
Exam period

Throughout the course

Re-exam


The re-exam concists of two part:
1) Resubmission of (possibly revised) final project. The revised project has to be handed in no later than 2 weeks before the reexam week.
2) Individual oral examination (30 minutes without preparation) based on submitted project and full syllabus.

The final grade is based on an overall assesment.

Criteria for exam assesment

See Learning Outcome.