basic information

The main goals of the course are to 1) prepare you for further studies in machine learning, and 2) introduce you to methods and tools that are commonly used when solving machine learning problems in practice. You will gain both theoretical knowledge and understanding of machine learning, as well as practical skills.

LECTURES: Attending lectures (Thursdays and Fridays) is NOT compulsory, but everyone is of course encouraged to be present. We will discuss the course topics, and I do my best to explain difficult details in a clear and understandable manner. To get the most out of the lectures, I am suggesting that you take a look at the course textbook in advance. (See the Timetable-section below for a list of topics.) If you have some particularly difficult questions, feel free to send me an email before the lecture so that I have time to think of an intelligent answer.

PROBLEM SESSIONS: Taking part in one of the problem session groups each week is a (pretty much non-negotiable) requirement. You will get points for solving exercises, and these will contribute to your final grade for the course. The problem sessions take place on: Wed 16-18, Thu 12-14 (two groups in this slot!), Thu 16-18, and Fri 12-14. You can choose freely which one you attend, but we reserve the right to suggest changes to your selection if it looks like the initial assignment is too imbalanced.

The problem sessions work as follows: You prepare your answers in advance, and mark at the beginning of the session for which problems you are ready to present your solution to the class. Then, in small groups you discuss your solutions, and together try to converge to a reasonable answer. Finally, each group presents their answer to the others. The problem sessions are hosted by Janne Leppä-Aho, Saska Dönges, and yours truly.

Further details about practicalities are discussed in the first lecture!

GRADING: There are 100 points in total for the course, 40 from exercises and 60 from the exam. You need at least 50% from both to pass. To get full exercise points, you must have completed at least 80% of the problems. The final grade is determined by the following intervals:
50-59: grade 1
60-69: grade 2
70-79: grade 3
80-89: grade 4
90-100: grade 5




Käyttäjän Antti Ukkonen kuva

Antti Ukkonen

Julkaistu, 29.11.2018 klo 18:09

Dear ML Students,

Next week, on December 6th, Finland celebrates its 101st year of independence. Hence, *no exercise sessions on Thursday*.

Please try to attend any of the sessions on Wednesday or Friday.

If you really cannot make it to either of these, please send your exercises to Janne Leppä-Aho by email. Kindly indicate in the body of the email *clearly* which exercises you have solved.



Tentative schedule for lectures is as follows:
1 Lecture: Introduction to the course, practicalities / logistics, simple examples
2 Lecture: “Ingredients of Machine learning”, the idea of generalisation error
3 Lecture: Linear models + Evaluation
4 Lecture: Linear models + Evaluation (cont.)
5 Lecture: Classification, probabilistic methods in general
6 Lecture: Classification, Gaussian classifiers and Naive Bayes
7 Lecture: Classification, k-NN and decision trees
8 Lecture: Support Vector Machines
9 Lecture: Clustering
10 Lecture: Principal component analysis / dimensionality reduction
11 Lecture: Resampling & ensemble methods
12 Lecture: Special guest stars present practical applications of machine learning

To 1.11.2018
14:15 - 16:00
Pe 2.11.2018
10:15 - 12:00
To 8.11.2018
14:15 - 16:00
Pe 9.11.2018
10:15 - 12:00
To 15.11.2018
14:15 - 16:00
Pe 16.11.2018
10:15 - 12:00
To 22.11.2018
14:15 - 16:00
Pe 23.11.2018
10:15 - 12:00
To 29.11.2018
14:15 - 16:00
Pe 30.11.2018
10:15 - 12:00
Pe 7.12.2018
10:15 - 12:00
To 13.12.2018
14:15 - 16:00
Pe 14.12.2018
10:15 - 12:00

Muu opetus

08.11. - 29.11.2018 To 12.15-14.00
13.12. - 13.12.2018 To 12.15-14.00
Janne Leppä-Aho
Opetuskieli: englanti
09.11. - 14.12.2018 Pe 12.15-14.00
Antti Ukkonen
Opetuskieli: englanti
08.11. - 29.11.2018 To 16.15-18.00
13.12. - 13.12.2018 To 16.15-18.00
Saska Dönges
Opetuskieli: englanti
07.11. - 12.12.2018 Ke 16.15-18.00
Janne Leppä-Aho
Opetuskieli: englanti
08.11. - 29.11.2018 To 12.15-14.00
13.12. - 13.12.2018 To 12.15-14.00
Saska Dönges
Opetuskieli: englanti



Project for separate exam takers

If you wish to take a separate exam without having taken part in the exercise sessions, you must complete a programming project. (Exercise points are still valid in the exam on Jan 31, however.) Please be in touch with the lecturer before starting to work on the project, unless you have already received an email from Antti about this.

More instructions are given in the project description below.


Data Science Master's programme

Data Science Methods module

The course is available to students from other degree programmes

Prerequisites in terms of knowledge

Basics of probability calculus and statistics (including multivariate probability, Bayes formula, and maximum likelihood estimators) and intermediate level linear algebra (including multivariate calculus). Good programming skills in some language and the ability to quickly acquire the basics of a new environment (R or python/numpy/scipy). Some knowledge of data science and artificial intelligence is useful but not required.

Prerequisites for students in the Data Science programme, in terms of courses


Prerequisites for other students in terms of courses

Introduction to statistics (including multivariate probability, Bayes formula, and maximum likelihood estimators). Linear algebra and matrices I-II (including multivariate calculus). TKT10002 Introduction to Programming and TKT10003 Advanced Course in Programming (i.e., good programming skills in some language and the ability to quickly acquire the basics of a new environment (R or python/numpy/scipy)).

Recommended preceding courses

DATA11001 Introduction to Data Science and DATA15001 Introduction to Artificial Intelligence

Courses in the Machine Learning module

  • Defines and is able to explain basic concepts in machine learning (e.g. training data, feature, model selection, loss function, training error, test error, overfitting)
  • Recognises various machine learning problems and methods suitable for them: supervised vs unsupervised learning, discriminative vs generative learning paradigm, symbolic vs numeric data
  • Knows the basics of a programming environment (such as R or python/numpy/scipy) suitable for machine learning applications
  • Is able to implement at least one distance-based, one linear, and one generative classification method, and apply these to solving simple classification problems
  • Is able to implement and apply linear regression to solve simple regression problems
  • Explains the assumptions behind the machine learning methods presented in the course
  • Implements testing and cross- validation methods, and is able to apply them to evaluate the performance of machine learning methods and to perform model selection
  • Comprehends the most important clustering formalisms (distance measures, k-means clustering, hierarchical clustering)
  • Explains the idea of the k-means clustering algorithm and is able to implement it
  • Is able to implement a method for hierarchical clustering and can interpret its results

First semester (Autumn)

Typically 2nd period

  • statistical learning, models and data, evaluating performance, overfitting, bias-variance tradeoff
  • linear regression
  • classification: logistic regression, linear and quadratic discriminant analysis, naive Bayes, nearest neighbour classifier, decision trees, support vector machine
  • clustering (flat and hierarchical); k-means, agglomerative clustering
  • resampling methods (cross-validation, bootstrap), ensemble methods (bagging, random forests)

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R, Springer, 2013.

Parts of the textbook that are required are specified on the course web page.

The course will involve weekly exercises that include both programming and other kinds of problems ("pen and paper").

Assessment and grading is based on completed exercises and a course exam. Possible other criteria will be specified on the course web page.

R-tutoriaalit to 1.11. klo 16-18 D123 ja pe 2.11. klo 12-14 D123.

  • Contact teaching
  • Possible attendance requirements are specified each year at the course web page
  • Completion is based on exercises and one or more exams. Possible other methods of completion will be announced on the course web page.