Københavns Universitet - Kurser

Se skema

24E-B1-1;Hold 01;;Natural Language Processing (NLP)

NDAK18000U Natural Language Processing (NLP)

Volume 2024/2025

Have you ever wondered how systems like ChatGPT, which can generate human-like text, are built? Are you intrigued by the idea of creating a system that can process, understand, or generate text automatically? Are you interested in building applications that can translate between languages, answer questions, or recognise named entities in text? If so, this course is designed for you.

This course provides an introduction to the fundamentals of Natural Language Processing (NLP), which involves computational models of language and their applications to text. As language is the core of human intelligence, NLP holds a pivotal role in Artificial Intelligence research and development.

We will integrate machine learning (ML), including its fundamental formalisms and algorithms, with a robust hands-on experience. This means you will gain practical skills in implementing these methods for real-world NLP problems.

The course utilises interactive lecture materials constructed with Jupyter notebooks. Course materials from last year are publicly available here. The course will closely follow the structure of the previous year's iteration. If you're unsure about the course prerequisites or content, please review these materials.

The course covers the following topics:

• NLP tasks: tokenisation, text classification, language modelling, named entity recognition, part-of-speech tagging, parsing, information extraction, machine translation, question answering

• Methods: log-linear models, structured prediction, conditional random fields, beam search, and neural network models such as transformers (including representation learning, pre-training, transfer learning and interpretability methods)

• Implementations: relationship between NLP tasks, efficient implementations, and the use of modern NLP libraries such as Hugging Face's Transformers

Throughout the course, we will also explore the themes of discriminative and generative learning and various ways of obtaining supervision for training statistical NLP models. An important aspect of our discussions will be the application of these techniques in multilingual settings, understanding how NLP can be adapted and applied to a variety of languages beyond English.

Learning Outcome

Knowledge of

core NLP tasks (e.g. machine translation, question answering, information extraction)
methods (e.g. classification, structured prediction, representation learning)
implementations (e.g. relationship between NLP tasks, efficient implementations)

Skills to

identify the different kinds of NLP tasks
choose the correct algorithm for a given problem situation
implement core algorithms in Python using PyTorch
assess the most appropriate algorithms to solve a given NLP problem
distinguish and evaluate the advantages of different approaches to the same task

Competences to

decompose natural language tasks into manageable components
evaluate systems quantitatively and qualitatively
apply the learned skills in a wider context to areas that face similar challenges, e.g., data science, social science, or bioinformatics
critically assess the limitations and use cases of language models, and apply this knowledge to the development and deployment of these models in real-world scenarios

Literature

See Absalon for a list of course literature.

Recommended Academic Qualifications

Knowledge of machine learning (probability theory, linear algebra, classification, neural networks) and programming (Python) is required, either through formal education or self-study. No prior knowledge of natural language processing or linguistics is required.

Relevant machine learning competencies can be obtained through one of the following courses:
- NDAK22002U Advanced Deep Learning (ADL) or Deep Learning (DL)
- NDAK22000U Machine Learning A (MLA)
- NDAK22001U Machine Learning B (MLB)
- NDAK16003U Introduction to Data Science (IDS)

Academic qualifications equivalent to a BSc degree are recommended.

If you are in doubt about whether you meet the course prerequisites, you can check the course materials from last year here: https://github.com/coastalcph/nlp-course.

Teaching and learning methods

The format of the class consists of lectures (possibly including guest lectures), exercises, and project work.

Workload

Category
Hours
Lectures
28
Preparation
14
Theory exercises
57
Practical exercises
57
Project work
50
Total
206

Feedback form

Written

Oral

Individual

Collective

Continuous feedback during the course of the semester

Exam

Credit

7,5 ECTS

Type of assessment

On-site written exam, 1.5 hours under invigilation

Written assignment, During course

Type of assessment details

The exam consists of two parts. Each part is assessed and weighted individually, and the final grade is determined based on this:

1. A group project weighted as 50% of the final grade, written during the course (group members either hand-in individual reports or explicitly indicate their contribution in a joint group report).

2. A 1.5 hours written exam weighted as 50% of the final grade.

Aid

All aids allowed

The use of AI assistance powered by Large Language Models (LLM)/Large Multimodal Models (LMM) – such as ChatGPT and GPT-4 – is permitted for the written assignment (group project), under conditions that will be specified during the course, but not for the written exam.

Marking scale

7-point grading scale

Censorship form

No external censorship

Several internal examiners

Re-exam

The re-exam consists of two parts:

1) Resubmission of (possibly revised) final project. The revised project has to be handed in no later than 2 weeks before the re-exam week.
2) A 30 minutes individual oral examination without preparation, based on submitted project and full syllabus.

The parts are not weighted, and an overall assessment is provided as the final grade.

Criteria for exam assesment

See Learning Outcome.

Course information

Language: English
Course code: NDAK18000U
Credit: 7,5 ECTS
Level: Full Degree Master
Duration: 1 block
Placement: Block 1
Schedule: B
Course capacity: No limitation – unless you register in the late-registration period (BSc and MSc) or as a credit or single subject student.

Course is also available as continuing and professional education

Study board

Study Board of Mathematics and Computer Science

Contracting department

Department of Computer Science

Contracting faculty

Faculty of Science

Course Coordinators

Daniel Hershcovich (2-686c44686d326f7932686f)
Anders Søgaard (8-7d796f716b6b7c6e4a6e7338757f386e75)

Lecturers

Daniel Hershcovich
Anders Søgaard
Desmond Elliot

Saved on the 14-02-2024

Tilbage