Instruction

Name Cr Method of study Time Location Organiser
Distributed Data Infrastructures 5 Cr General Examination 7.8.2020 - 7.8.2020
Name Cr Method of study Time Location Organiser
Distributed Data Infrastructures 5 Cr Online Examination 23.4.2020 - 23.4.2020
Distributed Data Infrastructures 5 Cr General Examination 29.1.2020 - 29.1.2020
Distributed Data Infrastructures 5 Cr Lecture Course 29.10.2019 - 12.12.2019
Distributed Data Infrastructures 5 Cr General Examination 5.9.2019 - 5.9.2019
Distributed Data Infrastructures 5 Cr General Examination 9.8.2019 - 9.8.2019
Distributed Data Infrastructures 5 Cr General Examination 29.4.2019 - 29.4.2019
Distributed Data Infrastructures 5 Cr General Examination 30.1.2019 - 30.1.2019
Distributed Data Infrastructures 5 Cr Lecture Course 30.10.2018 - 13.12.2018
Distributed Data Infrastructures 5 Cr General Examination 6.9.2018 - 6.9.2018
Distributed Data Infrastructures 5 Cr General Examination 10.8.2018 - 10.8.2018
Distributed Data Infrastructures 5 Cr General Examination 20.4.2018 - 20.4.2018
Distributed Data Infrastructures 5 Cr General Examination 26.1.2018 - 26.1.2018
Distributed Data Infrastructures 5 Cr Lecture Course 31.10.2017 - 14.12.2017

Target group

Master's Programme in Data Science is responsible for the course.

The course belongs to the Data Science Methods / Basic Studies in Data Science module.

The course is available to students from other degree programmes.

Prerequisites

Prerequisites in terms of knowledge

Good programming skills, preferably in Python

Prerequisites for students in the Data Science programme, in terms of courses

None

Prerequisites for other students in terms of courses

Programming course

Recommended preceding courses

None

Learning outcomes

After the course, the student:

  • Knows different infrastructures and systems for large-scale data science processing
  • Can compare various infrastructures and their suitability for a particular problem
  • Can select the appropriate tools and environments for a particular problem
  • Can justify the system design choices behind existing data science infrastructures
  • Is able to implement or extend components for processing infrastructures

Timing

Recommended time/stage of studies for completion: first year of data science MS studies

Term/teaching period when the course will be offered: autumn term, Period II

Contents

In this course we will study different distributed data processing infrastructures, such as MapReduce, Spark, Petuum, and GraphLab. We will cover their basic design and operation and discuss their differences and suitability for various types of data science problems. Through reading, class discussions, and practical exercises, you will get an overview of the various systems, gain experience in their use, and learn about their designs.

Activities and teaching methods in support of learning

During the lectures we will cover material from research articles and the students are expected to have read the articles before the lecture so that they can participate in class discussions.

Exercises in the course will mainly focus on using the various distributed data processing infrastructures in practice and applying them to concrete data science problems. There will be weekly exercise sessions for discussions around the problems and Q&A sessions.

Study materials

Literature is based on research articles and other online material and will be provided during the course.

Assessment practices and criteria

Grading scale 0-5

Grade will be a combination of course exam, mandatory course exercises, and additional exercises as given during the course. Most of the weight of the grade comes from the practical exercises and written exercises around the data processing infrastructures covered in the course.

Recommended optional studies

Data Science Project

Completion methods

The course will consist of lectures, written exercises, programming exercises, and possibly other forms of teaching.

Activity during the course, including possibly mandatory attendance, will be required to pass the course.

The course can also be taken as a separate exam via self-study and possible additional exercises.