Kaisa_2012_3_photo by Veikko Somerpuro

Update 17 March 2020: The rest of this seminar course will be carried out remotely. This means that students prepare video presentations and upload them to Moodle (https://moodle.helsinki.fi/course/view.php?id=37630). The schedule is the one agreed in the beginning of the course: presentation deadlines on weeks 12,13,14,16 and 17, deadline for report reviews: 17 April, deadline for final versions of the report: 15 May. Please follow Moodle for more information and also for instructions on how to prepare a "voice over pdf slides" video presentation using Zoom.

Enrol

Timetable

Here is the course’s teaching schedule. Check the description for possible other schedules.

DateTimeLocation
Thu 16.1.2020
14:15 - 16:00
Thu 23.1.2020
14:15 - 16:00
Thu 30.1.2020
14:15 - 16:00
Thu 6.2.2020
14:15 - 16:00
Thu 13.2.2020
14:15 - 17:00
Thu 20.2.2020
14:15 - 16:00
Thu 27.2.2020
14:15 - 16:00
Thu 12.3.2020
14:15 - 16:00
Thu 19.3.2020
14:15 - 16:00
Thu 26.3.2020
14:15 - 16:00
Thu 2.4.2020
14:15 - 17:00
Thu 16.4.2020
14:15 - 17:00
Thu 23.4.2020
14:15 - 17:00
Thu 30.4.2020
14:15 - 16:00

Other teaching

Description

Master's Programme in Data Science is responsible for the course.

The course is available to students from other degree programmes based on separate admission. Priority will be given to students of the

Master’s Programme in Data Science.

Academic Writing for Students in English-Medium Master's Degree Programmes 2 to be completed concurrently with the seminar.

Prerequisite: Introduction to Machine Learning.

Distributed Data Infrastructures.

The student can find information on a given topic in scientific literature and write a scientific report and deliver an oral presentation based on this material.

During or after the first year of MSc studies.

The seminar will cover machine learning methods that can learn from data distributed across a network. Distribution of the data poses several unique challenges such as memory and communication needs, scalability, efficiency. The data may be fragmented horizontally (different instances of data at different sites) or vertically (subsets of attributes stored at different sites). Several aspects of the methods will be covered, such as software architectures, optimisation problems, algorithms, deep learning problems.

The students can select their topic among alternatives provided from the following themes :

Software architectures and frameworks for distributed machine learning
Federated learning
Distributed learning with privacy constraints (e.g. cryptographic techniques, differential privacy)
Distributed deep learning (such as DistBelief)
Parallel Bayesian inference (e.g. parallel MCMC methods)
Vertically distributed data

Some topics will be more theoretical and others more computational.

Mostly relevant topical scientific articles, list of articles provided by the instructor.

The students will agree upon a specific topic with the instructor and write a scientific report and deliver a scientific oral presentation on the topic.

The students are expected to participate actively in the seminar sessions. Seminar includes also peer review of other students' reports and presentations.

Grading scale 1-5. The grading will combine evaluation of the written report, the oral presentation and other course participation.

Contact teaching.

Compulsory attendance at seminar sessions.

Written report, oral presentation, peer feedback on the presentations and reports of other students.

Post doc Antti Koskela

Assoc. Prof. Antti Honkela