Welcome to the Introduction to Machine Learning!

For prerequisites, contents, learning outcomes, and completion methods, see "Description" below.

For study materials, see "Material" below.

For the course discussion forum, see "Interaction" below.

If you want to complete the course by a separate exam, you must also complete a project. See the instructions for this under 'Tasks'. In the exams, you will be allowed to have a "cheat sheet" (a two-sided hand-written A4 with your own notes) in the exam. A calculator is allowed but not necessary.

Reading material (textbook):
week 1: pp. 1-33 (supervised vs unsupervised learning, etc)
week 2: pp. 33-36 (bias-variance) and 127-142 (logReg and LDA)
week 3: pp. 39-42 (k-NN), 149-154 (QDA; discussed last week) and 303-316 (decision trees)
week 4: pp. 82-92 (categorical features, feature transforms), 337-364 (SVM)
week 5: Chapter 10: Unsupervised learning (clustering and dimension reduction)
week 6: pp. 316–321 (bootstrap, bagging, and random forests)



Teemu Roos's picture

Teemu Roos

Published, 22.12.2017 at 10:53

Hi all,

A *very* important request to all students: Please remember to give course feedback through the anonymous course feedback system.

Your feedback really matters. We read all feedback, including the non-numerical feedback. Any comments, suggestions, and ideas are more than welcome.

You'll find the feedback form on the CS department homepage:

(Choose Introduction to Machine Learning under Advanced Studies.)

Teemu, Ville & Janne

PS. Happy holidays!

Teemu Roos's picture

Teemu Roos

Published, 20.11.2017 at 15:59

[You are receiving this message because you are registered to the Intro to Machine Learning course.]

Hi all,

There will be *no lecture* on Thursday, November 23rd. Friday's lecture will be held as usual.

No changes in the exercise sessions.

best wishes,
Teemu & TAs


We will use a dedicated Piazza forum to support interaction. Please feel free to ask questions and answer/discuss other's questions! You will receive a link to join if you are registered to the course.


Here is the course’s teaching schedule. Check the description for possible other schedules.

Thu 2.11.2017
14:15 - 16:00
Fri 3.11.2017
10:15 - 12:00
Thu 9.11.2017
14:15 - 16:00
Fri 10.11.2017
10:15 - 12:00
Thu 16.11.2017
14:15 - 16:00
Fri 17.11.2017
10:15 - 12:00
Thu 23.11.2017
14:15 - 16:00
Fri 24.11.2017
10:15 - 12:00
Thu 30.11.2017
14:15 - 16:00
Fri 1.12.2017
10:15 - 12:00
Thu 7.12.2017
14:15 - 16:00
Fri 8.12.2017
10:15 - 12:00
Thu 14.12.2017
14:15 - 16:00
Fri 15.12.2017
10:15 - 12:00

Other teaching

02.11.2017 Thu 12.15-14.00
09.11. - 14.12.2017 Thu 12.15-14.00
Ville Hyvönen
Teaching language: English
10.11. - 15.12.2017 Fri 12.15-14.00
Janne Leppä-Aho
Teaching language: English
09.11. - 14.12.2017 Thu 16.15-18.00
Janne Leppä-Aho
Teaching language: English
01.11.2017 Wed 16.15-18.00
08.11. - 29.11.2017 Wed 16.15-18.00
13.12.2017 Wed 16.15-18.00
Ville Hyvönen
Teaching language: English
09.11. - 14.12.2017 Thu 12.15-14.00
Teemu Roos
Teaching language: English


The course is based on the textbook Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer, 2013 (see link below).

You may also find last year's course materials useful (see link below).


Example solutions

Project for separate exam takers

If you wish to take a separate exam without having taken part in the exercise sessions, you must complete a project. The deadline for returning the project is 10 days before the exam. For the dates of the separate exams, please see the department homepage (under 'Studies'). More instructions are given in the project description below.


Data Science Master's programme

Data Science Methods

The course is available to students from other degree programmes

Basics of probability theory and statistics (e.g. course Introduction to statistics, including multivariate probability, Bayes formula, and maximum likelihood estimators) and intermediate level linear algebra (e.g. some elements of Linear algebra and matrices I-II, including multivariate calculus)

Good programming skills in some language and the ability to quickly acquire the basics of a new environment (R or python/numpy/scipy)

Introduction to Data Science and Introduction to Artificial Intelligence are recommended but not required.

Courses in the Machine Learning module

  • Defines and is able to explain basic concepts in machine learning (e.g. training data, feature, model selection, loss function, training error, test error, overfitting)
  • Recognises various machine learning problems and methods suitable for them: supervised vs unsupervised learning, discriminative vs generative learning paradigm, symbolic vs numeric data
  • Knows the basics of a programming environment (such as R or python/numpy/scipy) suitable for machine learning applications
  • Is able to implement at least one distance-based, one linear, and one generative classification method, and apply these to solving simple classification problems
  • Is able to implement and apply linear regression to solve simple regression problems
  • Explains the assumptions behind the machine learning methods presented in the course
  • Implements testing and cross- validation methods, and is able to apply them to evaluate the performance of machine learning methods and to perform model selection
  • Comprehends the most important clustering formalisms (distance measures, k-means clustering, hierarchical clustering)
  • Explains the idea of the k-means clustering algorithm and is able to implement it
  • Is able to implement a method for hierarchical clustering and can interpret its results

First semester (Autumn)

Typically 2nd period

    • statistical learning, models and data, evaluating performance, overfitting, bias-variance tradeoff
    • linear regression
    • classification: logistic regression, linear and quadratic discriminant analysis, naive Bayes, nearest neighbour classifier, decision trees, support vector machine
    • clustering (flat and hierarchical); k-means, agglomerative clustering
    • resampling methods (cross-validation, bootstrap), ensemble methods (bagging, random forests)

    Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer, 2013.

    Parts of the textbook that are required are specified on the course web page.

    The course will involve weekly exercises that include both programming and other kinds of problems ("pen and paper").

    Assessment and grading is based on completed exercises and a course exam. Possible other criteria will be specified on the course web page.

    • Contact teaching
    • Possible attendance requirements are specified each year at the course web page
    • Completion is based on exercises and one or more exams. Possible other methods of completion will be announced on the course web page.