A introduction to natural language processing

A thorough grounding in the field of Natural Language Processing (NLP).

This course will give an introduction to the field of Natural Language Processing (NLP), covering central concepts, example applications and the application of modern machine learning (ML) techniques to NLP problems. It will go into more detail on some particular applications, showing how they have been tackled, and what component sub-tasks they involve.

NLP is a broad field, including a large number of sub-tasks and applications. We will have an overview of the field, including the classic natural language understanding (NLU) pipeline and its components. We will look in more detail at various specific areas (finite-state methods, syntax and parsing, lexical and compositional semantics, and more). We will look at some modern statistical methods, including how neural networks and deep learning can be applied to linguistic analysis.

We will also cover natural language generation (NLG), including a comparison of rule-based systems and applications of deep learning and other machine learning techniques.

A look at semantics and pragmatics will highlight some key unsolved problems in the field and show why it remains an active and challenging area for research.

**Scheduling of lab sessions**
Lab sessions are on Thursdays, 9-11. They are optional: an opportunity to get help with the week's assignment. If you also need to attend the Compilers session at 10:00, it's not a problem to attend just the first half of the NLP session. We will prioritize helping those who need to leave halfway.

Enrol

Messages

Mark Granroth-Wilding's picture

Mark Granroth-Wilding

Published, 4.3.2020 at 13:41

Dear all NLP course participants,
Remember that the **final project report** is due by the end of this Friday, 6.3. Some of you have already contacted me to ask about extending the deadline. Please let me know ASAP if you'd like an extension and haven't already.

If you have agreed an extension of the deadline with me, please email me by the original deadline to give at least a vague indication of what topic your project will be on. This will allow us to assign markers to projects after Friday, even if some projects will be submitted later.

Remember to check the instructions about the project [1]. The marked submission will be the **report**, not the code. The webpage has a list of things that should be included in the report. The marking criteria we apply are: (1) how well you have addressed the listed questions (see "Submission" section); and (2) how well you, in doing so, show an understanding of the relevant theoretical content (see "The main criterion is that you display...").

I believe there are still some submissions missing for the W7 assignment (temporal IE). If you have not submitted, but still intend to (and haven't notified me), please email Lidia or me, as Lidia is currently marking.

If anything is still unclear or you're having problems, feel free to use the Moodle forum still.

Finally, thanks to all of you for your active participation in the course. As you'll have noticed, we relied quite a bit on group discussions, interactive exercises, etc. for learning, which depend on your collective participation to work (and I think they generally did!).

Best regards,
Mark

[1] https://markgw.github.io/uh-nlp20/final_project/

Timetable

Here is the course’s teaching schedule. Check the description for possible other schedules.

DateTimeLocationInfo
Mon 13.1.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 17.1.2020
10:15 - 12:00
Granroth-Wilding Mark
Mon 20.1.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 24.1.2020
10:15 - 12:00
Granroth-Wilding Mark
Mon 27.1.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 31.1.2020
10:15 - 12:00
Granroth-Wilding Mark
Mon 3.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 7.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Mon 10.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 14.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Mon 17.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 21.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Mon 24.2.2020
10:15 - 12:00
Granroth-Wilding Mark
Fri 28.2.2020
10:15 - 12:00
Granroth-Wilding Mark

Other teaching

16.01. - 27.02.2020 Thu 09.15-11.00
Khalid Alnajjar, Mark Granroth-Wilding, Leo Leppänen, Lidia Pivovarova, Eliel Soisalon-Soininen, Elaine Zosa
Teaching language: English

Material

Lecture slides will be made available here before each lecture.

Assignment instructions are available online (see link below).
Submit your assignments via the course's Moodle site (link above).

Lecture material

Instructions

Feedback

16 students submitted feedback via WebOodi. A summary of the questionnaire responses (mean ratings with std dev):

The objectives of the course were clear to me from the beginning: 4.0 (0.89)
The material used in the course supported the achievement of learning goals: 4.5 (0.63)
Activity at the course supported the achievement of learning goals: 4.5 (0.63)
Assessment of the course measured the achievement of the core learning objectives: 4.31 (0.79)
The course was laborious: 3.19 (0.75)
Exercises and discussions during lectures helped me understand the content: 4.56 (0.63)

Rate the overall speed of the lectures (1=too slow / too little content, 5=too fast / too much content): 2.94 (0.77)
Rate the length and difficulty of the assignments (1=too easy / short, 5=took too long): 2.75 (0.58)
Rate the balance of underlying theory and practical applications (1=too theoretical, 5=too much focus on applications): 3.13 (0.5)

I give the course as a whole the grade: 4.44 (0.63)

I have gone through the free-text feedback and collected the following key comments, to which I have given some responses below.

# Assignments
- Not difficult enough
- NLG assignment too long, IE too short
- Clearer instructions in certain places: e.g. exact software dependencies
- Too simplified – not enough real data, with all its challenges
> We will try to adjust the way we develop assignments in future to take all of this into account
- Learning objectives not clear
> Having clearly stated learning objectives for each assignment would improve them a lot
- Use Jupyter for submission
> I will consider this in the future. On the other hand, some students also commented that there was too much “hand holding” in the exercises, and giving pre-written code with gaps would only increase this.
- Timing of help sessions
> This clearly didn’t work well, as not many students came. I’ll definitely organise this differently next time.

# Lectures
- General preference for less content, but more depth on each topic (and more theoretical depth)
> I agree: I felt the same about the course this year :-)
- More demonstrations and applications
> This could certainly help give valuable context and engage more, so I’ll try to do this
- Group discussions and exercises worked well and helped learning
> This isn’t something that everyone likes, but the overriding opinion in the comments seemed to suggest it worked well
- Presemo for asking questions was helpful

# Course overall
- Too easy
> This can tie in nicely with adding more depth on fewer topics
- Lectures too long
> Yep: I’m working on that!
- Mandatory lectures: opinions for and against
> For those who don’t like this: the reason is that I’ve been trying to make the lectures more than just lectures, with group discussions, etc. If you just look through the slides, you don’t benefit from this, so I need to develop alternative learning methods for anyone who’s not present. Plus, these methods only work if there are enough students present!
I’m considering changing this in future, but it would have negative consequences for engagement in the learning.
> Other students felt this worked well (for exactly these reasons)

There were also some other, more specific, comments and suggestions, which I’ve also taken note of.
Thanks, everyone, for the valuable feedback – it will help a lot in developing this course and others in the future!

Description

Master's programme in Data Science is responsible for the course.

The course is available to students from other degree programmes.

Prerequisite courses: DATA11002 Introduction to Machine Learning; TKT20005 Models of Computation

The student should have at least a basic familiarity with the following topics before the course starts.

  • Supervised vs unsupervised learning

  • Overfitting and regularization

  • Dimensionality reduction

  • Mathematics of simple probabilistic models and estimation

  • Concepts of classification and regression

  • Formal languages: in particular finite state automata and transducers, and context-free grammars
    (Covered by TKT20005 Models of Computation)
  • Programming:

    • Basic abilities in Python

    • Familiarity with Numpy recommended

Suggested reading on these topics will be provided before the course, to help students to fill in any gaps in their knowledge or revise the concepts.

Programming assignments will be completed in Python, so at least some previous experience of Python programming is essential.

Experience in linguistics / language processing is not required. However, a basic familiarity with some linguistic concepts will make it easier to follow the course. Links to recommended reading material will be provided before the start of the course.

By the end of the course, the student will:

  • have an understanding of the basic linguistic concepts underlying typical approaches to NLP;
  • be familiar with traditional pipeline approaches to NLP systems;
  • be aware of the main subtasks and typical components in such pipelines;
  • have a good understanding of some commonly used probabilistic and other statistical models and how they are used for practical NLP tasks;
  • know how to tackle some NLP applications by combining existing approaches to their subtasks;
  • understand how recent machine learning methods (such as deep learning) can be applied to linguistic tasks;
  • know how NLP systems and components are typically evaluated and understand good practices in evaluation and data handling;
    be aware of some key open research questions and unsolved problems in NLP.

Spring term 2020, period 3.

This course will give an introduction to the field of Natural Language Processing (NLP), covering central concepts, example applications and the application of modern machine learning (ML) techniques to NLP problems. It will go into more detail on some particular applications, showing how they have been tackled, and what component sub-tasks they involve.

NLP is a broad field, including a large number of sub-tasks and applications. We begin with an overview of the field, covering the classic natural language understanding (NLU) pipeline and its components. Then we look in more detail at various specific areas, including finite-state methods, syntax and parsing, lexical and compositional semantics, vector-space models and document-level analysis. We will look at some modern statistical methods, including how neural networks and deep learning can be applied to linguistic analysis.

We then cover the other other side of NLP, natural language generation (NLG), including a comparison of classic rule-based systems and recent applications of deep learning and other machine learning techniques. We will also see how components can be combined for an important current application: information extraction.

Finally, a look at the topics of semantics and pragmatics will highlight some key unsolved problems in the field and show why it remains an active and challenging area for research.

The course will primarily follow two textbooks:

Speech and Language Processing. Jurafsky & Martin. 2nd edition, 2009. Pearson Education
Natural Language Processing. Jacob Eisenstein. Draft textbook, Nov 13 2018. Available on Github.

We will also refer to the following textbook, in particular in relation to the practical assignments:

Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. Bird, Klein & Loper. 2nd edition. Available online.
Specific references to these textbooks will provided in lectures.

An additional reading list, including recommended reading prior to the course and suggested reading to refresh background knowledge (see Prerequisites above), will be provided later, before the course.

  • Lectures
  • Practical lab sessions, with teacher/TA support
  • Weekly assessed assignments
  • Example code and other materials provided online
  • Submission of code, programme output, solutions and written answers

All links, slides and other materials will be made available online, via the course webpage.

The following components will be assessed:

  • Assignments graded on a 1-5 scale (average of 3 or more).
  • Final individual project, report submitted shortly after course.
  • Attendance of all lectures (unless exception agreed with lecturer)
  • Participation in discussions during lectures (some active participation observed by lecturers)

Contact teaching only.

Two lectures per week (see timetable), mandatory.

One lab session per week (optional), to support completion of assessed assignments relating to the lecture material.

Full participation in lectures is expected. Any anticipated exceptions should be discussed with the lecturer before signing up for the course.

Mark Granroth-Wilding