Kaisa_2012_3_photo by Veikko Somerpuro

23.12.2019 at 09:00 - 12.1.2020 at 23:59



Master's Programme in Data Science is responsible for the course.

The course belongs to the CSM14000 - Software Systems study track module.

The course is available to students from other degree programmes.

Prerequisites in terms of knowledge

Good programming skills. Basic data models such as the relational data model and semi-structured data models (e.g., JSON, XML).

Prerequisites for students in the Data Science programme, in terms of courses


Prerequisites for other students in terms of courses

TKT10002 Introduction to Programming

Recommended preceding courses


  • Transaction management and query optimisation
  • Big data framework
  • Distributed data framework
Here’s an overview of our goals for you in the course. After completing this course you should be able to:
* Describe the Big Data landscape including examples of real-world big data problems and the three key sources of Big Data: people, organizations, and sensors.
* Explain the V’s of Big Data and why each impacts the collection, monitoring, storage, analysis, and reporting.
* Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model.
* Understand different kinds of data models, including relational, semi-structured, graph, vector space and others.
* Have the ability to describe streaming data and the different challenges it presents.
* Understand the difference among data lake, data mart, and data warehouse
* Understand the differences between a DBMS and a BDMS.
* Describe the difference between ACID and BASE properties
* Use SQL to formulate the query and view for relational databases
* Write queries and indexes on MongoDB
* Explain the data integration problem and define integrated views and schema mapping
* Understand Global-As-View and Local-As-View and the associated SQL formulation
* Describe record linking, data exchange, and data fusion tasks
* Gain hands-on experiences on various systems, including Splunk, Hadoop, Gephi, PostgreSQL and MongoDB.

Recommended time/stage of studies for completion: autumn the first or second year of the Master study

Term/teaching period when the course will be offered: the course is in Autumn term / second period. The course will be offered every year.

  • Hadoop and MapReduce, HDFS
  • data models, relational databases and SQL
  • semi-structured data and JSON query with MongoDB
  • data streaming and data lake
  • data integration
  • Hands-on experience for different systems, including Splunk, Hadoop, Gephi, PostgreSQL and MongoDB.
Fundamentals of database systems (Chapter 5,6,7,23, 24, 25)
Elmasri Ramez, Navathe Shamkant B.
2017 Seventh edition, Global edition.
The electrical material of this book is available in the university library.
The course consists of seven lectures, six exercises, seven hands-on sessions, and an exam.
Lectures: Attending lectures is not obligatory but it is useful. Lecture slides cover key facts which will be posted on the Moodle page of the course.
Exercises: The students are required to solve ALL problems.
Hands-on sessions: Students are supposed to do the hands-on exercises before attending the session. The purpose of the meeting is to provide students opportunities to ask questions.

The grading is based on the sum of the points from the exercises (max. 50 marks) and the exam (max. 50 marks). 51 marks are required to pass and give the lowest grade 1, 91 points or more gives the highest grade 5.

Course exam: The exam covers the lectures and all exercises. No notes or other material is allowed in the exam.
Renewal Exam: The renewal exam can be taken only if one submits the answers for all exercises.
Separate exams: Students need to submit the answer for all six exercises before the separate exam.

General exams last 3 hours and 30 minutes. Renewal exam (marked with "(U)") is the first general exam after the course and also a renewal exam of course exam(s). In a renewal exam the points student has earned during the course are taken into account. Exams marked with "(HT)" are allowed only to students who have completed the obligatory projects or other exercises included in those courses. Exams marked with "(HT/U)" are renewals to students who have completed the obligatory projects during the course. General exams might cover different area than the lectured course. Check the course web page and contact the responsible teacher if in doubt.

The course consists of lectures, six exercises, six hands-on sessions, and an exam.
Students need to submit the answers to all six exercises before attending any exam.

Jiaheng Lu