Kaisa_2012_3_photo by Veikko Somerpuro

30.3.2020 at 09:00 - 19.4.2020 at 23:59



Master's Programme in Data Science is responsible for the course.

The course belongs to the Data Science Methods / Basic Studies in Data Science module.

The course is available to students from other degree programmes.

Prerequisites in terms of knowledge

Basics of data communications and distributed systems (TKT20004 Introduction to Data Communications, CSM13001 Distributed Systems). Big-O-notation and basics of algorithmic complexity. Basics of reliability in distributed systems. Basics of algorithm design and machine learning. Basics of Linear Algebra. Skills with the programming languages like Scala and Python.

Prerequisites for students in the Data Science programme, in terms of courses

CSM13001 Distributed Systems and DATA11002 Introduction to Machine Learning

Prerequisites for other students in terms of courses

CSM13001 Distributed Systems and DATA11002 Introduction to Machine Learning

Recommended preceding courses


Check prerequisites and other courses in the Data Science Program.

  • Ability to compare different Big Data frameworks in a qualitative manner
  • Ability to assess the suitability of different systems to different use cases
  • Ability to compare different Big Data frameworks based on their design and implementation.
  • Knowledge of key performance issues and the ability to analyze these systems
  • Ability to create both batch and streaming solutions
  • Ability to design and implement a solution that uses distributed algorithms for a large dataset
  • Ability of designing distributed Big Data systems building on existing frameworks for batch and streaming processing.

Recommended time/stage of studies for completion: spring at the first (or Second) year of the Master study

Term/teaching period when the course will be offered: spring, typically 2nd period

  • Big Data Frameworks: definitions and systems
  • Internal operation and implementation of a Big Data framework (Spark)

  • Distributed algorithms for Big Data frameworks

  • Data Science applications

  • Course Lecture Materials
  • Scientific Articles

  • Interactive lectures
  • Hands on exercise sessions

  • Assignments.
  • Grading Scale 1...5.
  • 5 Exercise sets 100 points.

  • Exam 30 points.

Due to current COVID-19 situation general examinations in lecture halls are cancelled. You can check the completion method from the course page or contact the teacher to ask about alternative completion methods. - - - General exams last 3 hours and 30 minutes. Renewal exam (marked with "(U)") is the first general exam after the course and also a renewal exam of course exam(s). In a renewal exam the points student has earned during the course are taken into account. Exams marked with "(HT)" are allowed only to students who have completed the obligatory projects or other exercises included in those courses. Exams marked with "(HT/U)" are renewals to students who have completed the obligatory projects during the course. General exams might cover different area than the lectured course. Check the course web page and contact the responsible teacher if in doubt.

  • The course is offered in the form of contact teaching and thus it is important to be present in the lectures.

  • Final exam (60%) and course exercises (40%).