@byrawpixel via @pexelsphotos

Students will acquire familiarity with the basic concepts of data science. They will be able to distinguish between different kinds of data (e.g., statistical, structured, unstructured), and identify challenges related to big data (e.g., volume, velocity, veracity) and data governance (privacy, ethics, legal issues). The various goals of statistical modelling and machine learning, from exploration and data mining to validation and decision-making, and approaches suitable for them will be introduced. The students will be able to choose an appropriate machine learning setting (unsupervised, semi-supervised, supervised) and to evaluate the pros and cons of different approaches (for example, linear regression, deep learning, or decision trees). The students will be able to present the results of a data science project by means of reports and visualisations so that they can be used as a basis of operationalisation.

The course will include lectures, exercises, and a miniproject done in groups. The exercises are returned at the exercise sessions. Please make sure you have registered to an exercise group. Visiting another group than the one to which you have registered, is only allowed with special permission (also applies to group 99).

INFO FOR STUDENTS TAKING A SEPARATE EXAM (WITHOUT HAVING ATTENDED THE COURSE): In order to be able to take an exam, you must complete a project. The project submission deadline is 10 days before the exam (same as the registration deadline to separate exams). The project work follows the same guidelines as the course final project (will be available soon) except that no coaching and pitching is involved.

Enrol

Messages

Teemu Roos's picture

Teemu Roos

Published, 31.8.2018 at 13:57

Dear students in Introduction to Data Science,

Due to the festivities related to the opening of the academic year, the lecture on Monday September 3rd (10am) is CANCELLED. Apologies for the late notice.

The first lecture will be held on Tuesday September 4th at 4pm.

Looking forward to seeing you on Tuesday!

best wishes,
Teemu & Intro-to-DS team 2018

Interaction

We will use Piazza as the primary communication channel where we will make announcement about the practicalities (exercises, etc.), and where you can ask question concerning the course.

If you had registered to the course by September 4th, you should have received an email with instructions about joining the group. If not, please contact the course staff.

Timetable

Here is the course’s teaching schedule. Check the description for possible other schedules.

DateTimeLocation
Tue 4.9.2018
16:15 - 18:00
Mon 10.9.2018
10:15 - 12:00
Tue 11.9.2018
16:15 - 18:00
Mon 17.9.2018
10:15 - 12:00
Tue 18.9.2018
16:15 - 18:00
Mon 24.9.2018
10:15 - 12:00
Tue 25.9.2018
16:15 - 18:00
Mon 1.10.2018
10:15 - 12:00
Tue 2.10.2018
16:15 - 18:00
Mon 8.10.2018
10:15 - 12:00
Tue 9.10.2018
16:15 - 18:00
Mon 15.10.2018
10:15 - 12:00
Tue 16.10.2018
16:15 - 18:00

Other teaching

10.09. - 15.10.2018 Mon 16.15-18.00
Ioanna Bouri
Teaching language: English
11.09. - 16.10.2018 Tue 12.15-14.00
Johannes Verwijnen
Teaching language: English
12.09. - 17.10.2018 Wed 12.15-14.00
Johannes Verwijnen
Teaching language: English
12.09. - 17.10.2018 Wed 16.15-18.00
Ioanna Bouri
Teaching language: English

Material

Tasks

Exercises

The exercises and other instructions are available on github. Please click below.

Description

Master's Programme in Data Science is responsible for the course.

The course belongs to the Data Science methods / Basic Studies in Data Science module.

The course is available to students from other degree programmes.

Prerequisites in terms of knowledge

Programming skills in one or more languages (python recommended; R will probably be helpful), basic familiarity with command-line interfaces (such as Linux/Unix shell)

Prerequisites for students in the Data Science programme, in terms of courses

None

Prerequisites for other students in terms of courses

None

Recommended preceding courses

None

The students will acquire familiarity with the basic concepts of data science. They will be able to describe the importance of data science in science, in the society, and in business. Students will also be able to characterise the different roles of a data scientist, and understand the skills required by them.

The students will be able to distinguish between different kinds of data (e.g., statistical, structured, unstructured), and identify challenges related to big data (e.g., volume, velocity, veracity) and data governance (privacy, ethics, legal issues).

The students will learn to identify the problems and tasks involved in the life-cycle of a Data Science project, including data collection, data preprocessing, data management, data analysis, presentation, and operationalisation (end-user point of view). They will be able to store and access different kinds of data using suitable database and data management tools, implement conversions between different data formats, and to control the accessibility of privacy-sensitive data.

The various goals of statistical modelling and machine learning, from exploration and data mining to validation and decision-making, and approaches suitable for them will be introduced. The students will be able to choose an appropriate machine learning setting (unsupervised, semi-supervised, supervised) and to evaluate the pros and cons of different approaches (for example, linear regression, deep learning, or decision trees).

The students will be able to present the results of a data science project by means of reports and visualisations so that they can be used as a basis of operationalisation.

Recommended time/stage of studies for completion: 1st semester

Term/teaching period when the course will be offered: may be offered in the autumn or spring term or both. Typically in the autumn in period I

Basic concepts of Data Science, including:

  • data science in science, society, business
  • different kinds of data (statistical, structured, unstructured, big data, ...),
  • jobs of a data scientist

The life-cycle of a Data Science project:

  • data collection
  • data preprocessing, "wrangling" (data formats, XML, statistical data, ...)
  • data management (databases, accessibility, sharing, governance, ethics, privacy)
  • exploratory data analysis: summary statistics
  • presentation, visualisation
  • operationalisation (end-user point of view)

The literature and other materials (both required and recommended) will be specified each year.

Students are required to complete exercises and projects.

The assessment and grading of the course is based on completed exercises, one or more exams, and/or projects.

The course requires some amount of group work. The groups (3 students)
will be formed during the first few weeks. If you already know that
you'd like to team up with someone, it will be convenient if you can
sign up in the same exercise group.

  • The course will be offered in the form of contact teaching
  • Possible attendance requirements will be decided each year
  • The completion of the course is based on exercises, one or more exams, and/or projects