Big data course learning goals

We are in the era of “big data”. Data sets grow fast in size because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, remote sensing, software logs, cameras, microphones, and wireless sensor networks. Most big data environments go beyond relational databases and traditional data warehouse platforms. The increasing focus on big data is shaping new algorithms and techniques. This course will mainly discuss some selected algorithms and systems on big data management, including Hadoop and MapReduce framework, MongoDB databases and data sketches algorithms.

Please enroll yourself to the Moodle page. Most of the information of this course will be released in Moodle:

https://moodle.helsinki.fi/course/view.php?id=29529

Enrol

Messages

Jiaheng Lu's picture

Jiaheng Lu

Published, 15.8.2018 at 12:32

Hi,

Welcome to join the big data management course!

Please register yourself to the Moodle Page, where you will find the complete information of this course, including slides, exercises, and reading materials.

Moodle page:

https://moodle.helsinki.fi/course/view.php?id=29529

We use Piazza forum for the interaction of this course, please enroll yourself to Pizza.

https://piazza.com/helsinki.fi/fall2018/data14002

Our first lecture will be held on Tue 4.9.2018, 10:15 - 12:00 Exactum, D122. Please come to join us.

Best regards

Jiaheng behalf of the teaching team

Interaction

We use Piazza forum for the interaction of this course, please enrol yourself. Thanks!

Timetable

Here is the course’s teaching schedule. Check the description for possible other schedules.

DateTimeLocation
Tue 4.9.2018
10:15 - 12:00
Mon 10.9.2018
14:15 - 16:00
Tue 11.9.2018
10:15 - 12:00
Mon 17.9.2018
14:15 - 16:00
Tue 18.9.2018
10:15 - 12:00
Mon 24.9.2018
14:15 - 16:00
Tue 25.9.2018
10:15 - 12:00
Mon 1.10.2018
14:15 - 16:00
Tue 2.10.2018
10:15 - 12:00
Mon 8.10.2018
14:15 - 16:00
Tue 9.10.2018
10:15 - 12:00
Mon 15.10.2018
14:15 - 16:00
Tue 16.10.2018
10:15 - 12:00

Other teaching

Description

Master's Programme in Data Science is responsible for the course.

The course belongs to the CSM14000 - Software Systems study track module.

The course is available to students from other degree programmes.

Prerequisites in terms of knowledge

Good programming skills. Basic data models such as the relational data model and semi-structured data models (e.g., JSON, XML).

Prerequisites for students in the Data Science programme, in terms of courses

None

Prerequisites for other students in terms of courses

TKT10002 Introduction to Programming

Recommended preceding courses

None

  • Transaction management and query optimisation
  • Big data framework
  • Distributed data framework
Here’s an overview of our goals for you in the course. After completing this course you should be able to:
* Describe the Big Data landscape including examples of real-world big data problems and the three key sources of Big Data: people, organizations, and sensors.
* Explain the V’s of Big Data and why each impacts the collection, monitoring, storage, analysis, and reporting.
* Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model.
* Understand different kinds of data models, including relational, semi-structured, graph, vector space and others.
* Have the ability to describe streaming data and the different challenges it presents.
* Understand the difference among data lake, data mart, and data warehouse
* Understand the differences between a DBMS and a BDMS.
* Describe the difference between ACID and BASE properties
* Use SQL to formulate the query and view for relational databases
* Write queries and indexes on MongoDB
* Explain the data integration problem and define integrated views and schema mapping
* Understand Global-As-View and Local-As-View and the associated SQL formulation
* Describe record linking, data exchange, and data fusion tasks
* Gain hands-on experiences on various systems, including Splunk, Hadoop, Gephi, PostgreSQL and MongoDB.

Recommended time/stage of studies for completion: autumn the first or second year of the Master study

Term/teaching period when the course will be offered: the course is in Autumn term / second period. The course will be offered every year.

  • Hadoop and MapReduce, HDFS
  • data models, relational databases and SQL
  • semi-structured data and JSON query with MongoDB
  • data streaming and data lake
  • data integration
  • Hands-on experience for different systems, including Splunk, Hadoop, Gephi, PostgreSQL and MongoDB.
Fundamentals of database systems (Chapter 5,6,7,23, 24, 25)
Elmasri Ramez, Navathe Shamkant B.
2017 Seventh edition, Global edition.
The electrical material of this book is available in the university library.
The course consists of seven lectures, six exercises, seven hands-on sessions, and an exam.
Lectures: Attending lectures is not obligatory but it is useful. Lecture slides cover key facts which will be posted on the Moodle page of the course.
Exercises: The students are required to solve ALL problems.
Hands-on sessions: Students are supposed to do the hands-on exercises before attending the session. The purpose of the meeting is to provide students opportunities to ask questions.

The grading is based on the sum of the points from the exercises (max. 50 marks) and the exam (max. 50 marks). 51 marks are required to pass and give the lowest grade 1, 91 points or more gives the highest grade 5.

Course exam: The exam covers the lectures and all exercises. No notes or other material is allowed in the exam.
Renewal Exam: The renewal exam can be taken only if one submits the answers for all exercises.
Separate exams: Students need to submit the answer for all six exercises before the separate exam.
The course consists of lectures, six exercises, six hands-on sessions, and an exam.
Students need to submit the answers to all six exercises before attending any exam.

Jiaheng Lu