Kaisa_2012_3_photo by Veikko Somerpuro

Data science for real

Work together as a group to address real data science problems offered by research organizations and companies.

The project course focuses on solving a practical data science problem in a 3-4 person groups. You choose a topic offered by an external client (either a research group in another field or a company) that provides interesting real-world data and specifies a general goal for the project. You then work together to develop a prototype solution, working on all aspects of a typical data science project, from planning to implementation and dissemination. The course combines hands-on modeling development (algorithms, machine learning etc) with software implementation, to develop a practical tool that works with the data in a seamless manner.

Enrol

Timetable

Here is the course’s teaching schedule. Check the description for possible other schedules.

DateTimeLocation
Tue 14.1.2020
10:15 - 12:00
Tue 21.1.2020
10:15 - 12:00
Tue 28.1.2020
10:15 - 12:00
Tue 4.2.2020
10:15 - 12:00
Tue 11.2.2020
10:15 - 12:00
Tue 18.2.2020
10:15 - 12:00
Tue 25.2.2020
10:15 - 12:00
Tue 10.3.2020
10:15 - 12:00
Tue 17.3.2020
10:15 - 12:00
Tue 24.3.2020
10:15 - 12:00
Tue 31.3.2020
10:15 - 12:00
Tue 7.4.2020
10:15 - 12:00
Tue 21.4.2020
10:15 - 12:00
Tue 28.4.2020
10:15 - 12:00

Other teaching

Conduct of the course

The course will not have regular meetings. Instead, we will meet roughly six times during the spring, with the tentative timeline:
- January 21: Practicalities and selection of topics
- January 28: Best practices for data science product development
- February 4 and 11: Data science tools
- Early February: Project work starts, initial planning, meeting with the client
- Late March: Half-way review of projects
- Late April: Projects are ready

Besides the project, we will take a look at how data science projects are carried out and get familiar with practical tools used for building data science products.

Description

Master's Programme in Data Science is responsible for the course.

The course belongs to the Data Science Methods module.

The course is primarily intended for students of the Data Science Master's program. Other students can enrol for the course, but in case it fills up preference is given for the Data Science students.

Prerequisites in terms of knowledge

Software development skills on a level that is sufficient for working as part of a larger software development team (good programming skills, version control etc), for example as obtained during Bachelor in Computer Science. Some background on modeling data; no requirements are assumed on specific set of algorithms, but one should be familiar with the basic process of learning models from data and evaluating their accuracy, and should know some practical models or algorithms that can be applied for such tasks.

Prerequisites for students in the Data Science programme, in terms of courses

DATA11002 Introduction to Machine Learning (or DATA12002 Probabilistic Graphical Models)

Prerequisites for other students in terms of courses

Good programming skills; DATA11001 Introduction to Data Science; at least one of: DATA11002 Introduction to Machine Learning, DATA20001 Deep Learning, DATA12002 Probabilistic Graphical Models

Recommended preceding courses

None

The project is about applying theoretical knowledge into solving practical problems, and hence all other courses in the program support the course.

Other courses that support the further development of the competence provided by this
course: Data Science Project II

Student is able to solve a practical data science challenge as part of a group, taking responsibility of individual elements of a bigger project while actively interacting with the group towards solving a common goal. Can identify and formalise a need or target for a data-driven service given a context (typically a data source or device that produces data), can choose suitable tools for solving the problem, and is able to deliver a functioning service that fills the need. Is aware of the challenges associated with working on real data and recognises potential limitations and challenges of data science tools, and can find information for solving them. Can analyse practical data science tools and make presentable conclusions about their usability. Is able to apply theoretical knowledge learned during other courses in practice.

Recommended time/stage of studies for completion: first year spring or during second year

Term/teaching period when the course will be offered: offered during both spring and fall, covering periods I-II and III-IV

Application of data science skills in producing a practical data science product or service. The detailed content, such as algorithms and tools used for creating the solution, depends on the practical problem and domain chosen by the group.

The course material is provided as lecture notes, slides and links to external sources.

The course combines instructions by the lecturer, presentations by the students, and long-term group work. The details of the supervision of the group work will be determined case-by-case. The students will write a study diary analysing and reflecting their learning during the course.

Grading scale is 1...5.

The grading is based on active participation in the group work, demonstrable individual contributions in the final result, the quality and complexity of the solution and its presentation, and the quality of the individual work not carried out as a group member, such as a tool presentation and the study diary.

The course is completed as a group project. The group is together responsible for delivering a practical data science solution for a problem they have jointly identified. The group will also present the solution for the rest of the course. In addition, the course typically involves elements the student completes alone, such as analysing a particular tool and presenting it for the rest of the course attendants as well as a study diary. The groups receive supervision from the teacher and possibly other instructors.