Messages
Interaction
A Telegram chat room for the course has been opened. We recommend that you use the channel either through a web browser or the Telegram desktop application.
You can reach the channel through https://t.me/tkt_dap. The browser version can be reached through https://web.telegram.org/.
The discussion channel is based on peer support. The teachers of the course (Saska Dönges and Veli Mäkinen) are participating the discussion on voluntary basis if time permits.
Timetable
Material
Majority of course materials are maintained in a github page. These contain instructions on the exercises and project work (TMC server). The general conduct of the course is maintained in this course page, including timetable, instructions on project submission and its peer-review (moodle) and exam (examinarium).
Course materials have been created by Jarkko Toivonen following an initial planning phase with contributions from several people. Especially each project work is based on materials by different people.
This course instance is instructed by Saska Dönges (course materials, TMC assignments) and Veli Mäkinen (general conduct of the course).
Other
Tasks
Final project
Select one of the project options listed below, follow the instructions, and return your solutions to Moodle in Jupyter notebook format. Then conduct peer-review: You should evaluate and give feedback on your own report and two other reports.
Note that before doing this final project, you should have passed all 6 parts of TMC assignments. The projects on sequence analysis and on regression analysis are indeed integrated to the TMC system and you will see further instructions (including template Jupyter notebook) only after passing the required assignments. For the project on fossil data analysis all the instructions are in the pdf below, and you can start from an empty Jupyter notebook to conduct the project.
The detailed instructions on peer-review and evaluation criteria will be visible in Moodle after the submission deadline. For the project on sequence analysis and regression analysis you can also follow the links below to see the evaluation criteria.
For the fossil data analysis project, evaluation is done in two categories as follows:
1. Give a grade 0…5 on the correctness of solutions, and provide constructive comments where you find places for improvement.
0: Less than half of assignments solved satisfactorily
1: At least half of assignments solved satisfactorily
2: At least half of assignment solved pretty correctly
3: 65% of assignments solved pretty correctly
4: 80% assignments solved pretty correctly
5: All but 1 assignment solved almost perfectly
To assess the percentage of correctness, you may give fractional points from a serious (but failed) attempt, 1 point from essentially correct answer, and divide total points by maximum points.
2. Give a grade 0…5 on the clarity of writing and code, and provide constructive comments where you find places for improvement.
0: Writing and code are not at sufficient level in the solved assignments
1: Writing and code are mostly at sufficient level in the solved assignments
2: Writing and code are mostly at satisfactory level in the solved assignments
3: Writing and code are mostly at good level in the solved assignments
4: Writing and code are mostly at very good level in the solved assignments
5: Writing and code are mostly at excellent level in the solved assignments
In each category, in addition to textual feedback, give also a grade in the range 0, 1, …, 5. The final grade for the project work will be the weighted average over the two categories, where category 1 has weight 2, and the second has weight 1. You must get at least grade 1 for the project work.
Information for new Master's degree students
Are you starting as a student in one of the University of Helsinki Master's Degree Programmes (e.g. Computer Science, Data Science, or Life Science Informatics) in Autumn 2019?
If so, this course works as a good preparation for your studies. You can start with the TMC assignments any time during the summer. The deadlines for the exam and project work have been designed so that you can conduct them as soon as you have your study right sorted out in the end of August. That is, please delay officially enrolling to this course until you have your University of Helsinki student number.
Certificate of studies
See Conduct of the course section for how to receive an informal pdf certificate for passing the TMC assignments.
If you wish to have the credits entered in the University of Helsinki’s student records, you must first register for the course at the Open University. The ECTS-credits are available for those who have a Finnish identity number and an online banking ID, and for International students at the University of Helsinki.
All studies you complete are entered into the University of Helsinki Oodi system within four to six weeks of the date of completion. Failed grades are also entered into Oodi.
Once the results have been recorded in WebOodi, you will receive a notification in your helsinki.fi email. You can then request a transcript of studies , which serves as an official certificate of completed studies.
Conduct of the course
Course is divided into 6 parts of automatically assessed exercises (using TMC system). One needs to pass 80% of each part to proceed to the next part. After these are successfully conducted, one can proceed to the final evaluation consisting of a multiple choice exam (in Examinarium) and peer-reviewed project work (using Moodle). The exam tests directly the knowledge gained while the project tests the ability to apply the learned skills in some selected field of science.
Course materials and TMC assignments are available for everyone without officially enrolling to the course: These form the massive open online part of the course (MOOC), and we provide a certificate for passing this 4 ECTS part of the course. Peer-review of the project work and exam are available only for enrolled students, as these are used for grading the course towards 5 ECTS credits. Only the conduct of the full course (5 ECTS with grade) will be registered as official credits at the University of Helsinki.
To pass the course (with grade 1), one needs to pass each part of TMC assignments (80%), exam (50%), project (50%), and take part in peer-review of the project. Grading (for grades 2-5) is the weighted average of the grades from the exam and the project (including peer-review). Exam and project have been designed so that the success in TMC assignments should reflect well the expected grade.
For the exam the grading scale is 120 points=1,140 points=2, 160 points=3,180 points=4, and 200 points=5, where 240 points is the maximum. The course exam is done in Examinarium. You have to be physically present and have your student card with you. No additional material can be used in the exam. There are 40 multiple choice questions chosen randomly from a larger set of questions. You have 55 minutes of time. The questions are distributed among five categories in the following way:
Basics (13 questions)
NumPy (8)
Visualization (3)
Pandas (12)
Machine learning (4)
The project work consists of similar tasks as in the weekly TMC assignments, but this time building a coherent story around a selected field of science. Currently the project topics offered are on sequence analysis (using dict for Markov chains etc.) and on medical data analysis (linear regression using statsmodel etc.), and on fossil data analysis. You can choose any one of these as your final project. The first two project works can also be superficially tested in the TMC system, but there is more freedom than in weekly assignments, and human interpretation is required to assess the quality of solutions. For this purpose, the project solutions are to be collected in Jupyter Notebook format and submitted to Moodle for peer-review. For two first project options we provide a template containing placeholders for discussion around the code, so that the outcome looks much like the course notes. For the project on fossil data analysis we currently have no template, so you can structure your work more freely. In addition to peer-review, the project work is also self-assessed. The grade of the project is determined by an overall assessment of the project and the peer-review work.
Examinarium: See the end of Description section of this page for detailed instructions on the exam.
Feedback
There were over 800 students registered to the course. 781 students finished at least one assignment in TMC system, 421 proceeded to part 2, 303 to part 3, 239 to part 4, 207 to part 5, and 164 to part 6. 138 students passed all parts of the TMC assignments (at least 80% done on each part) and received a certificate of having conducted 4 ECTS part of the course.
Out of those passing the TMC assignments, 75 students passed the project work and the exam, receiving the official 5 ECTS mark for the course through Open University.
Most students got a high grade from the project, but the exam did not go as well as on the previous (scheduled) course instance. This is quite understandable, as in this non-scheduled version of the course there can be a long break between finishing the assignments and taking the exam.
Feedback was gathered through the Moodle using the standard questiennary of the Open University. Summary statistics are shown below. Textual feedback was very constructive hinting places for improvements in the material / TMC tests.
This course was developed by the funding from the University of Helsinki "digiloikka" project, and this instance was and the forthcoming editions will be mostly offered as a self-study versions "as is". For this instance, the most significant investment was in the improvement of the TMC tests. In the future, exam questions need some revising to better suite for non-scheduled version. The feedback will be taken into account if/when the course material is updated.
Fully agree Partly agree Partly disagree Fully disagree
Learning objectives of the course were clear to me.
17 12 3 0
Teaching, study methods and assingments supported my learning on the course.
16 14 2 1
Instructions for the learning assignments were clear and easy to understand.
7 16 8 2
The evaluation criteria of the course was clearly presented.
18 13 1 0
I recieved enough feedback of my learning.
5 17 9 1
Interaction with other students supported my learning.
6 10 7 4
The teacher was sufficiently present during the course.
7 10 5 2
The course schedule worked for me.
30 1 0 0
The workload of the course corresponded with the credits (1 cr = ca. 27 h).
11 9 7 4
The technology of the online learning platform worked well on the course.
17 11 3 0
Registration and fee
Description
This course is a massive open online course (MOOC), which means it is available to everyone. For further information about the course, please see the MOOC learning environment.
Programming skills and basic knowledge of probability calculus and linear algebra.
The compulsory basic level courses in Bachelor's Programme in Science form a sufficient background.
-What other courses are recommended to be taken in addition to this course?
- Introduction to Data Science
-Which other courses support the further development of the competence provided by this
course?
- Introduction to Machine Learning
- Biological Sequence Analysis
- Can confidently write basic level Python programs without constantly consulting language/library documentation.
- Can apply efficient and elegant Pythonic idioms to solve problems
- Knows the different phases of data analysis pipeline
- Knows the fundamental data types array, Series and DataFrame
- Can clean data to form consistent Series and DataFrames without anomalies
- Can select subsets, transform, reshape and combine data
- Can extract summary statistics from data (min, max, mean, median, standard deviation)
- Knows the main types of machine learning (supervised learning: regression and classification, unsupervised learning: clustering, dimensionality reduction, (density estimation))
- Knows the estimator API of Scikit-Learn (choose model class, choose hyperparameters, form feature matrix and target vector, fit model, transform data or predict labels or responses)
- Can form feature matrix and target vector suitable for Scikit-Learn's model fitting algorithms
- Can visualize data as simple plots or histograms
- Can apply basic data analysis skills to a simple project on an application field
The course uses practical approach to different phases of data analysis pipeline: data fetching and cleaning, reshaping, subsetting, grouping, and combining data; and using aggregation, machine learning and data visualization to extract knowledge from data.
- Libraries: Numpy, Pandas, Scikit-learn, (Matplotlib)
- Interactive study materials: Jupyter notebook
- Automatic checking of exercises: Test My Code framework
- Basics of Python language
- Numpy
- Creation and indexing of arrays
- Array concatenation and splitting
- Fast computation using universal functions
- Summary statistics
- Broadcasting
- Matrix operations and basic linear algebra
- Pandas
- Creating and indexing of Series and DataFrames
- Handling missing data
- Concatenation of Series and DataFrames
- Grouping and aggregating
- Merging DataFrames
- Gentle introduction to machine learning through Scikit-learn library
- Linear regression
- Naive Bayes classification
- Principal component analysis
- k-means clustering
- Project on applying the learned skills on an application field
-What kind of literature and other materials are read during the course (reading list)?
Material is integrated to the MOOC instructions
-Which works are set reading and which are recommended as supplementary reading?
Jake VanderPlas, Python data science handbook, O'Reilly (2016)
The book is freely available in electronic form from https://jakevdp.github.io/PythonDataScienceHandbook/
MOOC includes automatic assessment of programming exercises
The grading scale is 1...5.
The final project, the peer-review work related to it and the exam are assessed.
Contact information:
- Questions regarding the learning environment: mooc@cs.helsinki.fi
- Questions about registering at the Open University: avoinyo-tietojenkasittelytiede@helsinki.fi
- You can find frequently asked questions on the course at the MOOC learning environment
- If you have questions about the content of the course, please contact the teacher in charge of the course, Veli Mäkinen (veli.makinen@helsinki.fi)
This course (5 cr) consists of 3 parts:
- Course materials and automatically assessed assignments in the MOOC learning environment
- Project work in Moodle
- Examinarium exam (in Helsinki)
Online learning environment Moodle opens on June 19, 2019.
How to get the Moodle-link and course key?
Next day after registration: log into this study programme with your University of Helsinki username.
You will receive more information on the username after registration.
EXAMINARIUM
Course grading is based on a multiple-choice Examinarium-exam and a project work handed in in Moodle. See the deadlines for the project works under the navigation bar "Timetable".
Examinarium exams are electronic exams taken on a computer in certain Examinarium rooms at the University. The exam is supervised via recording camera equipment installed in the rooms. You can book the time and the room of the exam yourself. Currently it is possible to take the exam in several Examinarium rooms at the University of Helsinki and it is anticipated that in spring 2020 this service will be available in several universities in Finland.
You can start your project work as well as book the time for the exam after required amount of automatically assessed programming exercises are conducted in the TMC system. You can take the Examinarium exam earliest on 12.8.2019 and latest on 16.12.2019!
How to take the Examinarium exam:
1. Register for the course via Open University (the Register button on this page).
2. Book the time for the Examinarium exam. Remember to register for the exam in good time. This way you'll get the time and Examinarium room you prefer (earliest one day after registering for the course).
Read the Examinarium instruction carefully before taking the exam.
Veli Mäkinen
The course is part of the subject studies in Computer Science.