Self-Study PYTHON course

Data analysis with Python is a practical introduction to data analysis using a large number of programming exercises.

The course covers Python libraries such as NumPy, Pandas, Matplotlib and SciPy. These are used for data cleaning, grouping, creation of summary statistics, and for machine learning tasks such as linear regression, Naive Bayes, PCA, and clustering. The course ends with a project work, where one can apply the skills in the context of a chosen field of science.

Course can be taken remotely, except for the multiple-choice exam taken in Examinarium.

Course materials and TMC assignments are available for everyone without officially enrolling on the course.

Cancellation: If you officially enrolled to the course through Open University but no longer plan to take it, you can follow these instructions to cancel your registration. Next opportinity to take this course is on period IV, Spring 2020.

Kirjaudu sisään nähdäksesi Moodlen kurssiavaimen.


Veli Mäkinen

Julkaistu, 29.1.2020 klo 16:31

Feedback section is now updated, showing some statistics from the Moodle questionnaire. Sorry for (possibly) wrong information on message below: Open University gathers feedback through Moodle, and you may not have received email about it.

Veli Mäkinen

Julkaistu, 18.12.2019 klo 10:23

Dear all,

the course has now been graded. See results section of the course page. Weboodi should also contain the grades in couple of days. Final grade is the average of exam grade and project grade.

If you finished the project, but missed the exam or wish to renew to uprade the grade, the exam has been re-opened in Examinarium until the end of January 2020.

If you did the exam and have the project ready, but failed to meet the deadline, please consult the teachers for options.

In other cases, a new edition of the course takes place in Period IV, Spring 2020, where you can basically continue from where you got so far.

See moodle for peer-reviews of your project. Almost all got the grade proposed in their self-review, with few exceptions. Special thanks to many thorough reviews, which helped a lot in our final assessment in cases where the proposed grades had large distortion. Few forgot to return the reviews, and that was taken into account in the project grading (-1).

So far, 72 students have passed this 5 ECTS edition of the course and 138 students have passed the TMC assignments (worth 4 ECTS certificate downloadable from TMC).

Any feedback is welcome to improve the course. Don't forgot to answer to the anonymous feedback survey (you should receive email about it soon).

Have a relaxing Christmas break!

Veli and Saska

Veli Mäkinen

Julkaistu, 5.9.2019 klo 8:03

A new project option on fossil data analysis has been added. See Tasks section. This is targeted for the December deadline, but it has also been added as an option for next Monday deadline in case someone is eager to try out.


A Telegram chat room for the course has been opened. We recommend that you use the channel either through a web browser or the Telegram desktop application.

You can reach the channel through The browser version can be reached through

The discussion channel is based on peer support. The teachers of the course (Saska Dönges and Veli Mäkinen) are participating the discussion on voluntary basis if time permits.


You can proceed with the TMC assignments with your own phase and take the exam once you have passed all parts of TMC assignments. However, the project work is scheduled, due to peer-review.

- TMC exercises: Start any time during period 19.6.2019-1.11.2019*
- Exam: 12.8.2019-16.12.2019, after passing TMC assignments
- Project work: Start any time during period 1.8.2019**-1.12.2019*, after passing TMC assignments. There are deadlines to choose from for returning the project work for peer-review (see below)

* This is not a strict limit, but a recommendation regarding the approaching deadline.
** Projects may become available already before 1.8.2019, if we finish updating them earlier.

Deadlines for submission of project work for peer-review are as follows (you may choose one of the options):
- Submission deadline Monday 9.9.2019 23:59 (see Moodle)
- Peer-review deadline Monday 16.9.2019 23:59 (see Moodle)
- Submission deadline Monday 9.12.2019 23:59 (see Moodle)
- Peer-review deadline Monday 16.12.2019 23:59 (see Moodle)

More choices may be added later, if there are enough students.

For your own scheduling, take into account that reaching 80% of weekly assignments may take between 5-20 hours depending on your background.

If you miss the December deadline, there is another chance during Spring 2020: A scheduled 7 week version of the course will be offered starting in March 2020. That version will include more instruction (locally at Kumpula campus) and less strict limits for the TMC assignments: If you find reaching the 80% limit taking too much time, you might want to consider joining the Spring version instead.


Majority of course materials are maintained in a github page. These contain instructions on the exercises and project work (TMC server). The general conduct of the course is maintained in this course page, including timetable, instructions on project submission and its peer-review (moodle) and exam (examinarium).

Course materials have been created by Jarkko Toivonen following an initial planning phase with contributions from several people. Especially each project work is based on materials by different people.

This course instance is instructed by Saska Dönges (course materials, TMC assignments) and Veli Mäkinen (general conduct of the course).


Final project

Select one of the project options listed below, follow the instructions, and return your solutions to Moodle in Jupyter notebook format. Then conduct peer-review: You should evaluate and give feedback on your own report and two other reports.

Note that before doing this final project, you should have passed all 6 parts of TMC assignments. The projects on sequence analysis and on regression analysis are indeed integrated to the TMC system and you will see further instructions (including template Jupyter notebook) only after passing the required assignments. For the project on fossil data analysis all the instructions are in the pdf below, and you can start from an empty Jupyter notebook to conduct the project.

The detailed instructions on peer-review and evaluation criteria will be visible in Moodle after the submission deadline. For the project on sequence analysis and regression analysis you can also follow the links below to see the evaluation criteria.

For the fossil data analysis project, evaluation is done in two categories as follows:

1. Give a grade 0…5 on the correctness of solutions, and provide constructive comments where you find places for improvement.

0: Less than half of assignments solved satisfactorily
1: At least half of assignments solved satisfactorily
2: At least half of assignment solved pretty correctly
3: 65% of assignments solved pretty correctly
4: 80% assignments solved pretty correctly
5: All but 1 assignment solved almost perfectly

To assess the percentage of correctness, you may give fractional points from a serious (but failed) attempt, 1 point from essentially correct answer, and divide total points by maximum points.

2. Give a grade 0…5 on the clarity of writing and code, and provide constructive comments where you find places for improvement.

0: Writing and code are not at sufficient level in the solved assignments
1: Writing and code are mostly at sufficient level in the solved assignments
2: Writing and code are mostly at satisfactory level in the solved assignments
3: Writing and code are mostly at good level in the solved assignments
4: Writing and code are mostly at very good level in the solved assignments
5: Writing and code are mostly at excellent level in the solved assignments

In each category, in addition to textual feedback, give also a grade in the range 0, 1, …, 5. The final grade for the project work will be the weighted average over the two categories, where category 1 has weight 2, and the second has weight 1. You must get at least grade 1 for the project work.

Information for new Master's degree students

Are you starting as a student in one of the University of Helsinki Master's Degree Programmes (e.g. Computer Science, Data Science, or Life Science Informatics) in Autumn 2019?

If so, this course works as a good preparation for your studies. You can start with the TMC assignments any time during the summer. The deadlines for the exam and project work have been designed so that you can conduct them as soon as you have your study right sorted out in the end of August. That is, please delay officially enrolling to this course until you have your University of Helsinki student number.

Certificate of studies

See Conduct of the course section for how to receive an informal pdf certificate for passing the TMC assignments.

If you wish to have the credits entered in the University of Helsinki’s student records, you must first register for the course at the Open University. The ECTS-credits are available for those who have a Finnish identity number and an online banking ID, and for International students at the University of Helsinki.

All studies you complete are entered into the University of Helsinki Oodi system within four to six weeks of the date of completion. Failed grades are also entered into Oodi.

Once the results have been recorded in WebOodi, you will receive a notification in your email. You can then request a transcript of studies , which serves as an official certificate of completed studies.

Kurssin suorittaminen

Course is divided into 6 parts of automatically assessed exercises (using TMC system). One needs to pass 80% of each part to proceed to the next part. After these are successfully conducted, one can proceed to the final evaluation consisting of a multiple choice exam (in Examinarium) and peer-reviewed project work (using Moodle). The exam tests directly the knowledge gained while the project tests the ability to apply the learned skills in some selected field of science.

Course materials and TMC assignments are available for everyone without officially enrolling to the course: These form the massive open online part of the course (MOOC), and we provide a certificate for passing this 4 ECTS part of the course. Peer-review of the project work and exam are available only for enrolled students, as these are used for grading the course towards 5 ECTS credits. Only the conduct of the full course (5 ECTS with grade) will be registered as official credits at the University of Helsinki.

To pass the course (with grade 1), one needs to pass each part of TMC assignments (80%), exam (50%), project (50%), and take part in peer-review of the project. Grading (for grades 2-5) is the weighted average of the grades from the exam and the project (including peer-review). Exam and project have been designed so that the success in TMC assignments should reflect well the expected grade.

For the exam the grading scale is 120 points=1,140 points=2, 160 points=3,180 points=4, and 200 points=5, where 240 points is the maximum. The course exam is done in Examinarium. You have to be physically present and have your student card with you. No additional material can be used in the exam. There are 40 multiple choice questions chosen randomly from a larger set of questions. You have 55 minutes of time. The questions are distributed among five categories in the following way:
Basics (13 questions)
NumPy (8)
Visualization (3)
Pandas (12)
Machine learning (4)

The project work consists of similar tasks as in the weekly TMC assignments, but this time building a coherent story around a selected field of science. Currently the project topics offered are on sequence analysis (using dict for Markov chains etc.) and on medical data analysis (linear regression using statsmodel etc.), and on fossil data analysis. You can choose any one of these as your final project. The first two project works can also be superficially tested in the TMC system, but there is more freedom than in weekly assignments, and human interpretation is required to assess the quality of solutions. For this purpose, the project solutions are to be collected in Jupyter Notebook format and submitted to Moodle for peer-review. For two first project options we provide a template containing placeholders for discussion around the code, so that the outcome looks much like the course notes. For the project on fossil data analysis we currently have no template, so you can structure your work more freely. In addition to peer-review, the project work is also self-assessed. The grade of the project is determined by an overall assessment of the project and the peer-review work.

Examinarium: See the end of Description section of this page for detailed instructions on the exam.


There were over 800 students registered to the course. 781 students finished at least one assignment in TMC system, 421 proceeded to part 2, 303 to part 3, 239 to part 4, 207 to part 5, and 164 to part 6. 138 students passed all parts of the TMC assignments (at least 80% done on each part) and received a certificate of having conducted 4 ECTS part of the course.

Out of those passing the TMC assignments, 75 students passed the project work and the exam, receiving the official 5 ECTS mark for the course through Open University.

Most students got a high grade from the project, but the exam did not go as well as on the previous (scheduled) course instance. This is quite understandable, as in this non-scheduled version of the course there can be a long break between finishing the assignments and taking the exam.

Feedback was gathered through the Moodle using the standard questiennary of the Open University. Summary statistics are shown below. Textual feedback was very constructive hinting places for improvements in the material / TMC tests.

This course was developed by the funding from the University of Helsinki "digiloikka" project, and this instance was and the forthcoming editions will be mostly offered as a self-study versions "as is". For this instance, the most significant investment was in the improvement of the TMC tests. In the future, exam questions need some revising to better suite for non-scheduled version. The feedback will be taken into account if/when the course material is updated.

Fully agree Partly agree Partly disagree Fully disagree

Learning objectives of the course were clear to me.
17 12 3 0
Teaching, study methods and assingments supported my learning on the course.
16 14 2 1
Instructions for the learning assignments were clear and easy to understand.
7 16 8 2
The evaluation criteria of the course was clearly presented.
18 13 1 0
I recieved enough feedback of my learning.
5 17 9 1
Interaction with other students supported my learning.
6 10 7 4
The teacher was sufficiently present during the course.
7 10 5 2
The course schedule worked for me.
30 1 0 0
The workload of the course corresponded with the credits (1 cr = ca. 27 h).
11 9 7 4
The technology of the online learning platform worked well on the course.
17 11 3 0

Ilmoittautuminen ja opintomaksu

Data Analysis with Python is a massive open online course (MOOC), which means it is available to everyone free of charge.

If you wish to have the credits entered in the University of Helsinki’s student records, you must register for the course at the Open University at the latest 9.12.2019. The exact registration time is shown in the Register button.

Please note:

  • The ECTS credits are possible to people who have a Finnish social security number (=Personal identity number).
  • Course materials and TMC assignments are available for everyone without officially enrolling on the course: These form the 4 ECTS massive open online part of the course (MOOC). Peer-review of the project work and exam are available only for enrolled students, as these are used for grading the course towards 5 ECTS credits.
  • Students and international students at the University of Helsinki can enrol on the course with their University of Helsinki username.

Open University reserves the right to make changes to the study programme.


This course is a massive open online course (MOOC), which means it is available to everyone. For further information about the course, please see the MOOC learning environment.

Programming skills and basic knowledge of probability calculus and linear algebra.

The compulsory basic level courses in Bachelor's Programme in Computer Science form a sufficient background.

-Which other courses are recommended to be taken in addition to this course?

  • Introduction to Data Science

-Which other courses support the further development of the competence provided by this

  • Introduction to Machine Learning
  • Biological Sequence Analysis
  • Can confidently write basic level Python programs without constantly consulting language/library documentation.
  • Can apply efficient and elegant Pythonic idioms to solve problems
  • Knows the different phases of data analysis pipeline
  • Knows the fundamental data types array, Series and DataFrame
  • Can clean data to form consistent Series and DataFrames without anomalies
  • Can select subsets, transform, reshape and combine data
  • Can extract summary statistics from data (min, max, mean, median, standard deviation)
  • Knows the main types of machine learning (supervised learning: regression and classification, unsupervised learning: clustering, dimensionality reduction, (density estimation))
  • Knows the estimator API of Scikit-Learn (choose model class, choose hyperparameters, form feature matrix and target vector, fit model, transform data or predict labels or responses)
  • Can form feature matrix and target vector suitable for Scikit-Learn's model fitting algorithms
  • Can visualize data as simple plots or histograms
  • Can apply basic data analysis skills to a simple project on an application field

The course uses practical approach to different phases of data analysis pipeline: data fetching and cleaning, reshaping, subsetting, grouping, and combining data; and using aggregation, machine learning and data visualization to extract knowledge from data.

  • Libraries: Numpy, Pandas, Scikit-learn, (Matplotlib)
  • Interactive study materials: Jupyter notebook
  • Automatic checking of exercises: Test My Code framework
  • Basics of Python language
  • Numpy
    • Creation and indexing of arrays
    • Array concatenation and splitting
    • Fast computation using universal functions
    • Summary statistics
    • Broadcasting
    • Matrix operations and basic linear algebra
  • Pandas
    • Creating and indexing of Series and DataFrames
    • Handling missing data
    • Concatenation of Series and DataFrames
    • Grouping and aggregating
    • Merging DataFrames
  • Gentle introduction to machine learning through Scikit-learn library
    • Linear regression
    • Naive Bayes classification
    • Principal component analysis
    • k-means clustering
  • Project on applying the learned skills on an application field

-What kind of literature and other materials are read during the course (reading list)?

Material is integrated to the MOOC instructions

-Which works are set reading and which are recommended as supplementary reading?

Jake VanderPlas, Python data science handbook, O'Reilly (2016)

The book is freely available in electronic form from

MOOC includes automatic assessment of programming exercises

The grading scale is 1...5.

The final project, the peer-review work related to it and the exam are assessed.

Contact information:

This course (5 cr) consists of 3 parts:

Online learning environment Moodle opens on June 19, 2019.

How to get the Moodle-link and course key?
Next day after registration: log into this study programme with your University of Helsinki username.
You will receive more information on the username after registration.


Course grading is based on a multiple-choice Examinarium-exam and a project work handed in in Moodle. See the deadlines for the project works under the navigation bar "Timetable".

Examinarium exams are electronic exams taken on a computer in certain Examinarium rooms at the University. The exam is supervised via recording camera equipment installed in the rooms. You can book the time and the room of the exam yourself. Currently it is possible to take the exam in several Examinarium rooms at the University of Helsinki and it is anticipated that in spring 2020 this service will be available in several universities in Finland.

You can start your project work as well as book the time for the exam after required amount of automatically assessed programming exercises are conducted in the TMC system. You can take the Examinarium exam earliest on 12.8.2019 and latest on 16.12.2019!

How to take the Examinarium exam:
1. Register for the course via Open University (the Register button on this page).
2. Book the time for the Examinarium exam. Remember to register for the exam in good time. This way you'll get the time and Examinarium room you prefer (earliest one day after registering for the course).

Read the Examinarium instruction carefully before taking the exam.

Veli Mäkinen

The course is part of the subject studies in Computer Science.