An intensive grounding in NLP

In two weeks, we will cover the basics of traditional NLP, as well as a number of advanced recent methods.

The course is organized by Mark Granroth-Wilding and also includes lectures by Leo Leppänen and Lidia Pivovarova.
Other teaching assistants are: Khalid Alnajjar, Eliel Soisalon-Soininen and Elaine Zosa.

Anmäl dig

Tidsschema

Mornings will consist of lectures. Afternoons will consist of practical assignments, to be carried out in a lab, with the support of the lecturer and TAs.

Note that there is no teaching on Thurs 30.5, since it is a holiday.

Undervisningsschema

DatumTidPlats
mån 20.5.2019
09:15 - 12:00
tis 21.5.2019
09:15 - 12:00
ons 22.5.2019
09:15 - 12:00
tors 23.5.2019
09:15 - 12:00
fre 24.5.2019
09:15 - 12:00
mån 27.5.2019
09:15 - 12:00
tis 28.5.2019
09:15 - 12:00
ons 29.5.2019
09:15 - 12:00
fre 31.5.2019
09:15 - 12:00

Topics for each day

DatumTidRubrikUtrymme

mån 20.5.2019
09:15
Introduction to NLP
tis 21.5.2019
09:15
NLU pipeline and toolkits
ons 22.5.2019
09:15
Finite state methods; statistical NLP
tors 23.5.2019
09:15
Syntax and parsing
fre 24.5.2019
09:15
Evaluation
mån 27.5.2019
09:15
NLG and dialogue
tis 28.5.2019
09:15
Vector space models and lexical semantics
ons 29.5.2019
09:15
Information extraction; advanced statistical NLP
fre 31.5.2019
09:15
Semantics and pragmatics; the future

Övrig undervisning

20.05. - 27.05.2019 mån 13.15-16.00
21.05. - 28.05.2019 tis 13.15-16.00
22.05. - 29.05.2019 ons 13.15-16.00
23.05.2019 tors 13.15-16.00
24.05. - 31.05.2019 fre 13.15-16.00
Mark Granroth-Wilding
Undervisningsspråk: Engelska

Material

Lecture slides will be made available here before each lecture.

Assignment instructions are available online (see link below).
Submit your assignments via the course's Moodle site (link above).

Föreläsningsmaterial

Instruktioner

Beskrivning

-Which degree programme is responsible for the course?

Data Science MSc

-Which module does the course belong to?

None

-Is the course available to students from other degree programmes?

Yes

Prerequisite courses: DATA11002 Introduction to Machine Learning; TKT20005 Models of Computation

The student should have at least a basic familiarity with the following topics before the course starts.

  • Supervised vs unsupervised learning

  • Overfitting and regularization

  • Dimensionality reduction

  • Mathematics of simple probabilistic models and estimation

  • Concepts of classification and regression

  • Formal languages: in particular finite state automata and transducers, and context-free grammars
    (Covered by TKT20005 Models of Computation)
  • Programming:

    • Basic abilities in Python

    • Familiarity with Numpy recommended

Suggested reading on these topics will be provided before the course, to help students to fill in any gaps in their knowledge or revise the concepts.

Programming assignments will be completed in Python, so at least some previous experience of Python programming is essential.

Experience in linguistics / language processing is not required. However, a basic familiarity with some linguistic concepts will make it easier to follow the course. Links to recommended reading material will be provided before the start of the course.

By the end of the course, the student will:

  • have an understanding of the basic linguistic concepts underlying typical approaches to NLP;
  • be familiar with traditional pipeline approaches to NLP systems;
  • be aware of the main subtasks and typical components in such pipelines;
  • have a good understanding of some commonly used probabilistic and other statistical models and how they are used for practical NLP tasks;
  • know how to tackle some NLP applications by combining existing approaches to their subtasks;
  • understand how recent machine learning methods (such as deep learning) can be applied to linguistic tasks;
  • know how NLP systems and components are typically evaluated and understand good practices in evaluation and data handling;
    be aware of some key open research questions and unsolved problems in NLP.

Spring term, 2019 only, intensive period

After Introduction to Machine Learning

This course will give an introduction to the field of Natural Language Processing (NLP), covering central concepts, example applications and the application of modern machine learning (ML) techniques to NLP problems. It will go into more detail on some particular applications, showing how they have been tackled, and what component sub-tasks they involve.

NLP is a broad field, including a large number of sub-tasks and applications. We begin with an overview of the field, covering the classic natural language understanding (NLU) pipeline and its components. We then cover the other other side of NLP, natural language generation (NLG), including a comparison of classic rule-based systems and recent applications of deep learning and other machine learning techniques.

We will look at some modern statistical approaches to NLU tasks, including how neural networks and deep learning can be applied to linguistic analysis. Then we will consider how the pipeline components can be combined for one important current application: information extraction. Finally, a look at the topics of semantics and pragmatics will highlight some key unsolved problems in the field and show why it remains an active and challenging area for research

The course will primarily follow the follow two textbooks:

Speech and Language Processing. Jurafsky & Martin. 2nd edition, 2009. Pearson Education
Natural Language Processing. Jacob Eisenstein. Draft textbook, Nov 13 2018. Available on Github.
We will also refer to the following textbook, in particular in relation to the practical assignments:

Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. Bird, Klein & Loper. 2nd edition. Available online.
Specific references to these textbooks will provided in lectures.

An additional reading list, including recommended reading prior to the course and suggested reading to refresh background knowledge (see Prerequisites above), will be provided later, before the course.

  • Lectures
  • Practical lab sessions, with teacher/TA support
  • Daily assessed assignments
  • Example code and other materials provided online
  • Submission of code, programme output, solutions and written answers

All links, slides and other materials will be made available online, via the course webpage.

The following components will be assessed:

  • Completion of daily assignments (minimum 80% of days completed)
  • An subset of assignments graded on a 1-5 scale (average of 3 or more)
  • Attendance of lectures (all days, unless exception agreed with lecturer)
  • Participation in discussions (some active participation observed by lecturer/TAs)

Contact teaching only.

This is an intensive course: daily participation over the full two-week period is necessary so as not to miss important content.

Mornings will be filled with lectures and discussion, introducing and exploring the material.

Afternoons will be used for lab sessions, which will include completion of assessed assignments relating to the day's lecture material.

Full participation in both morning and afternoon sessions is expected. Any anticipated exceptions should be discussed with the lecturer before signing up for the course.

Mark Granroth-Wilding