Instruction

Name Cr Method of study Time Location Organiser
Natural Language Processing 5 Cr Lecture Course 15.3.2021 - 7.5.2021
Name Cr Method of study Time Location Organiser
Natural Language Processing 5 Cr Lecture Course 13.1.2020 - 28.2.2020
Natural Language Processing 2.5 Cr Lecture Course 20.5.2019 - 31.5.2019

Target group

Master's programme in Data Science is responsible for the course.

The course is available to students from other degree programmes.

Prerequisites

Prerequisite courses: DATA11002 Introduction to Machine Learning; TKT20005 Models of Computation

The student should have at least a basic familiarity with the following topics before the course starts.

  • Supervised vs unsupervised learning

  • Overfitting and regularization

  • Dimensionality reduction

  • Mathematics of simple probabilistic models and estimation

  • Concepts of classification and regression

  • Formal languages: in particular finite state automata and transducers, and context-free grammars
    (Covered by TKT20005 Models of Computation)
  • Programming:

    • Basic abilities in Python

    • Familiarity with Numpy recommended

Suggested reading on these topics will be provided before the course, to help students to fill in any gaps in their knowledge or revise the concepts.

Programming assignments will be completed in Python, so at least some previous experience of Python programming is essential.

Experience in linguistics / language processing is not required. However, a basic familiarity with some linguistic concepts will make it easier to follow the course. Links to recommended reading material will be provided before the start of the course.

Learning outcomes

By the end of the course, the student will:

  • have an understanding of the basic linguistic concepts underlying typical approaches to NLP;

  • be familiar with traditional pipeline approaches to NLP systems;

  • be aware of the main subtasks and typical components in such pipelines;

  • be familiar with a number of different types of meaning representation for NLP applications and be able to apply appropriate representations to a variety of tasks;
  • have a good understanding of some commonly used probabilistic and other statistical models and how they are used for practical NLP tasks;

  • know how to tackle some NLP applications by combining existing approaches to their subtasks;

  • understand how recent machine learning methods (such as deep learning) can be applied to linguistic tasks;

  • know how NLP systems and components are typically evaluated and understand good practices in evaluation and data handling;
  • understand some forms of sentence-level syntactic analysis and why they are important for semantic NLP applications;
  • be aware of some key open research questions and unsolved problems in NLP.

Timing

Period 4, 2020-21.

After Introduction to Machine Learning

Contents

This course will give an introduction to the field of Natural Language Processing (NLP), covering central concepts, example applications and the application of modern machine learning (ML) techniques to NLP problems. It will go into more detail on some particular applications, showing how they have been tackled, and what component sub-tasks they involve.

NLP is a broad field, including a large number of sub-tasks and applications. We begin with an overview of the field, covering the classic natural language understanding (NLU) pipeline and its components. Then we look in more detail at various specific areas, including finite-state methods, syntax and parsing, lexical and compositional semantics, vector-space models and document-level analysis. We will look at some modern statistical methods, including how neural networks and deep learning can be applied to linguistic analysis.

We then cover the other other side of NLP, natural language generation (NLG), including a comparison of classic rule-based systems and recent applications of deep learning and other machine learning techniques. We will also see how components can be combined for an important current application: information extraction.

Finally, a look at the topics of semantics and pragmatics will highlight some key unsolved problems in the field and show why it remains an active and challenging area for research.

Activities and teaching methods in support of learning

  • Interactive teaching sessions, including lecturing, exercises and groupwork

  • Assessed assignments

  • Example code and other materials provided online

  • Submission of notes and answers during groupwork in teaching sessions

  • Submission of code, programme output, solutions and written answers

  • Model answers for assignments

  • Moodle forum for discussion with teachers and other students

All links, slides and other materials will be made available online, via the course Moodle page.

Study materials

The course will primarily follow the follow two textbooks:

We will also refer to the following textbook, in particular in relation to the practical assignments:

  • Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. Bird, Klein & Loper. 2nd edition. Available online.

Specific references to these textbooks will provided in lectures.

An additional reading list, including recommended reading prior to the course and suggested reading to refresh background knowledge (see Prerequisites above), will be provided before the course.

Assessment practices and criteria

Requirements to pass the course:

  • Attend all interactive teaching sessions.
    Exceptions to this can be made on an individual basis for up to two specific lectures, e.g. in case of illness, on the understanding that the student will read the lecture slides in their own time and complete any exercises stipulated by the lecturer.

  • Submit group assignments during teaching sessions.
    Assignments will in most cases be carried out in groups. Every group member should submit answers or notes on Moodle for each assignment. (These may be identical for all members of a group, but are submitted as evidence of participation.)

  • Pass all assignments, min 3/5 (60%).
    Students must achieve the minimum grade in every assignment to pass the course at all. Assignment grade is average of exercises for the week’s assignment, each marked 0-5.

  • Pass final project (report), min 3/15 (20%).

The final grade is then calculated as follows:

If all of the above requirements are not met: fail (0/5).

Overall score is the sum of the grades for the 7 assignments, plus the grade for the final project (0-15):.

This gives a score in the range 0-50 (actually 24-50, given the minimum requirement to pass the course). The final grade is then:

Score Grade
24-29 1
30-34 2
35-39 3
40-44 4
45-50 5

Completion methods

Contact teaching only.

There are three contact sessions each week. These are interactive learning sessions, including lecturing and group work. Active participation in the sessions is the primary method of learning, so essential to completion of the course and therefore compulsory.

A weekly individual practical assignment must be completed. Support will be provided by teaching assistants through a variety of channels.

A final project is submitted shortly after the end of teaching, building further on the practical aspects of the course.

There is no exam for the course.

Full participation in contact sessions is expected. Exceptions may be made for up to two sessions if necessary. Any anticipated exceptions should be discussed with the lecturer in advance.