Kaisa_2012_3_photo by Veikko Somerpuro

Anmäl dig
10.12.2018 kl. 09:00 - 13.1.2019 kl. 23:59


Master's Programme in Data Science is responsible for the course.

The course belongs to Core Courses, Data Science Seminar.

The seminar is available to students from other degree programmes. The number of participants is limited. The students are asked with the registration to describe briefly (in 1-2 sentences) why they would like to take the course. Registration closes on 13 January 2019. If there are more than 12 registrations the students will be selected based on their reason to take the course, the study background, and the order of registration.

Prerequisites in terms of knowledge

The student is assumed to have knowledge of basic principles of machine learning, especially of the supervised learning methods. The student should be able to understand scientific text in these disciplines and to write scientific reports.

Prerequisites for students in the Data Science programme, in terms of courses

DATA11002 Introduction to Machine Learning.

Prerequisites for other students in terms of courses

The course Introduction to Machine Learning, DATA11002, or equivalent knowledge. Bachelor's degree.

Recommended preceding courses


Specialization studies in Data Science, especially in Machine Learning and Stat­ist­ical Data Sci­ence. Studies in natural sciences, especially in computational methods.

The student will obtain an overview of the data-rich natural sciences, how these datasets are analysed, and how machine learning methods can be integrated into this analysis, both in terms of current state-of-the-art as well as open problems.

The student will learn to find and analyze existing scientific literature around a specific topic, write a scientific report and present the results orally. The student will also learn to act as a peer for the work of others.

The student should have completed an applicable Bachelor's degree before attending this seminar. The course Introduction to Machine Learning, DATA11002, should be completed before participating in the seminar, or the student should have equivalent knowledge from other sources.

The seminar is offered in the Spring term 2019, periods III-IV.

Many of the natural sciences are data rich. The data can come either via measurements, such as large-scale atmospheric measurements, physics experiments, or from computational simulations. Often, supervised machine learning methods are used to process the data. For example, regression functions, including those utilising deep learning methods, are used to calibrate measurements or classifiers are used to classify physics events.

There are several useful toolboxes and methods for processing data and implementing advanced supervised learning schemes. However, the situation is often far from satisfactory. Interesting questions include:

  • The data often contains artifacts which may affect the analysis and which should therefore be taken into account already when preprocessing the data. The expert user’s (here, the scientist’s) knowledge is crucial. By which methods can such data preprocessing be done?
  • How can the expert user find relevant and surprising features of the data, and compare datasets?
  • What are good principles and practices of implementing supervised learning methods in analysing actual data sets? How should the expert’s knowledge and what we know from learning theory be taken into account?
  • Powerful supervised learning methods are often essentially “black boxes”, i.e., it is difficult to understand their logic. As a result, a supervised learning method may work extremely well in the training phase but its behaviour in real implementations may be unpredictable. One reason for this behaviour can be, e.g., that the supervised learning method ends up utilising artefacts in the training data. How can the expert user understand the principles by which the supervised learning methods work? How can we control the statistical errors and concept drift of supervised learning methods, i.e., can we trust our predictions?
  • Which tools are best suited for a particular purpose?

The course includes introduction to the topic and it covers themes such as described above. The course will cover topics related to atmospheric sciences but not be limited to it; the methods and techniques discussed are general and applicable across multiple domains. The final content will be decided together with the students.

There will be one or more introductory lectures. The students are asked to write brief review articles of given topics (scope being couple of articles). These articles are then peer-reviewed by other students, revised by comments, after which each student makes a presentation. In addition to reviewing the state-of-the-art one of the purposes of the course is to identify unsolved problems which could be addressed in later research.

The reading list is to be announced in the first lecture and it will depend on the exact topics covered in the course, which depends partly on the students' interests.

The students will agree upon a specific topic with the instructor, write a scientific report, deliver a scientific oral presentation on the topic and act as a peer. The students are expected to participate actively in the seminar sessions.

Grading scale 1-5. The grading will combine evaluation of the written report, the oral presentation, and other course participation.

Contact teaching.

Attendance at the seminar sessions is obligatory. Absence from at most two meetings is accepted but will affect grading. Absence from the first meeting may however result to a dismissal from the course; please be in touch with the lecturer in advance, if you have a good reason to not attend the first meeting.

Written report, oral presentation(s), acting as a peer for other students, and active attendance at the sessions.

Assoc. Prof. Kai Puolamäki