SELF-STUDY PYTHON COURSE

Data analysis with Python is a practical introduction to data analysis using a large number of programming exercises.

The course covers Python libraries such as NumPy, Pandas, Matplotlib and SciPy. These are used for data cleaning, grouping, creation of summary statistics, and for machine learning tasks such as linear regression, Naive Bayes, PCA, and clustering.

Course can be taken remotely.

Course materials and TMC assignments are available for everyone without officially enrolling on the course.

Cancellation: If you officially enrolled to the course through Open University but no longer plan to take it, you can follow these instructions to cancel your registration: https://www.helsinki.fi/en/open-university/studying/beginning-your-studi...

NOTE: There is nothing interesting in MOODLE until the project work is published. You will gain access to it later.

Moodle
Kirjaudu sisään nähdäksesi Moodlen kurssiavaimen.

Viestit

Käyttäjän Emilia Oikarinen kuva

Emilia Oikarinen

Julkaistu, 27.5.2020 klo 10:27

Welcome to the course!

The course study material is now already available online, and after the course officially starts on Monday (June 1, 2020), you can start working on the programming assignments and submit them in the TMC system. Note that no official registration through Open University is needed at the beginning and there is nothing going on in the Moodle before the project work and exams. You'll be prompted to register later during the course.

BR,

Emilia & Saska

Vuorovaikutus

A Telegram chat room for the course has been opened. We recommend that you use the channel either through a web browser or the Telegram desktop application.

You can reach the channel through https://t.me/tkt_dap. The browser version can be reached through https://web.telegram.org/.

The discussion channel is based on peer support. The teachers of the course (Saska Dönges and Emilia Oikarinen) are participating the discussion on voluntary basis if time permits.

Aikataulu

You can proceed with the TMC assignments with your own phase. You can take the exam once you have passed all parts of TMC assignments. The project work is also scheduled, due to peer-review. See below for project work and exam schedules.

Scheduling:
- TMC exercises: Start any time during period 1.6.202020-1.3.2021*
- Exam in Moodle: after passing TMC assignments, choose one of the exam times provided below
- Project work: start any time after passing TMC assignments, choose one of the deadlines for returning the project work for peer-review

For your own scheduling, take into account that reaching 80% of each part in the assignments may take between 5-20 hours depending on your background.

(*) This is not a strict limit (i.e., registration is possible until April 2021), but a recommendation regarding the approaching deadline.

NOTE: all deadlines in this course are using Helsinki local time (UTC+2 between October 25, 2020 and March 28, 2021; UTC+3 otherwise)

Deadlines for project work (choose one):

SUMMER 2020
- Submission deadline Monday 17.8.2020 (23:59)
- Peer-review deadline Monday 24.8.2020 (23:59)
EARLY AUTUMN 2020
- Submission deadline Monday 28.9.2020 (23:59)
- Peer-review deadline Monday 5.10.2020 (23:59)
LATE AUTUMN 2020
- Submission deadline Monday 7.12.2020 (23:59)
- Peer-review deadline Monday 14.12.2020 (23:59)
SPRING 2021
- Submission deadline Friday 9.4.2021 (23:59)
- Peer-review deadline Friday 16.4.2021 (23:59)

Exam schedule:

Multiple-choice exam is available in Moodle as follows (choose one):

SUMMER: during 10.-24.8.2020 (23:59)
EARLY AUTUMN: during 21.9.-5.10.2020 (23:59)
LATE AUTUMN: during 30.11.-14.12.2020 (23:59)
SPRING: during 10.-16.4.2021 (23:59)

Materiaalit

Tehtävät

TMC assignments

Start the course by solving the TMC assignments.

Final project

After completing the TMC assignments, you can proceed to the final project. Select one of the project options listed below, follow the instructions, and return your solutions to Moodle in Jupyter notebook format. Then conduct peer-review. You should evaluate and give feedback on your own report and two other reports.

The projects on sequence analysis and on regression analysis are integrated to the TMC system and the course material includes detailed instructions on these. For the project on fossil data analysis all the instructions are in the pdf-file provided below, and you can start from an empty Jupyter notebook to conduct the project.

The instructions on peer-review will be visible in Moodle after the submission deadline. The evaluation criteria are as follows:

1. Give a grade 0…5 on the correctness of solutions, and provide constructive comments where you find places for improvement.

0: Less than half of assignments solved satisfactorily
1: At least half of assignments solved satisfactorily
2: At least half of assignment solved pretty correctly
3: 65% of assignments solved pretty correctly
4: 80% assignments solved pretty correctly
5: All but 1 assignment solved almost perfectly

To assess the percentage of correctness, you may give fractional points from a serious (but failed) attempt, 1 point from essentially correct answer, and divide total points by maximum points.

2. Give a grade 0…5 on the clarity of writing and code, and provide constructive comments where you find places for improvement.

0: Writing and code are not at sufficient level in the solved assignments
1: Writing and code are mostly at sufficient level in the solved assignments
2: Writing and code are mostly at satisfactory level in the solved assignments
3: Writing and code are mostly at good level in the solved assignments
4: Writing and code are mostly at very good level in the solved assignments
5: Writing and code are mostly at excellent level in the solved assignments

In each category, in addition to textual feedback, give also a grade in the range 0, 1, …, 5.

The final grade for the project work will be the weighted average over the two categories, where category 1 has weight 2, and category 2 has weight 1.

You must get at least grade 1 for the project work.

Jaakko Toivonen, Indre Zliobaite
Jarkko Toivonen, Antti Honkela
Jarkko Toivonen, Veli Mäkinen

Information for new Master's degree students at University of Helsinki

Are you starting as a student in one of the University of Helsinki Master's Degree Programmes (e.g., Computer Science, Data Science, or Life Science Informatics) in Autumn 2020?

If so, this course works as a good preparation for your studies. You can start with the TMC assignments any time during the summer. The EARLY AUTUMN deadlines for the exam and project work have been designed so that you can conduct them as soon as you have your study right sorted out at the end of August. That is, please delay officially enrolling to this course until you have your University of Helsinki student number.

Kurssin suorittaminen

Course materials and assignments in Test-My-Code (TMC) system are available for everyone without officially enrolling to the course. The automatically assessed exercises in the TMC system are divided into 6 parts and one needs to pass 80% of the exercises in each part to proceed to the next part. These exercises together with the course material form the massive open online part of the course (MOOC), and a certificate for passing this 4 ECTS part of the course is provided through the TMC website (under the user page).

After conducting the TMC exercises successfully, one can proceed to the final evaluation consisting of a multiple choice exam (in Moodle) and a peer-reviewed project work (using Moodle). The exam tests directly the knowledge gained, while the project tests the ability to apply the learned skills in some selected field of science. The exam and project work are used for grading the course towards 5 ECTS credits.

NOTE: Peer-review of the project work and exam are available only for enrolled students. One will be prompted to register through Open University after having progressed sufficiently with TMC assignments (i.e., after passing the three first parts of TMC exercises). More details of Open University registration requirements are provided in Section 'Registration and fee'.

NOTE: Only a fully completed course (5 ECTS with grade) will be registered as official credits at the University of Helsinki.

To pass the course (with grade 1), one needs to pass each part of TMC assignments (80%), exam (50%), project (50%), and take part in peer-review of the project. Grading (for grades 2-5) is the weighted average of the grades from the exam and the project (including peer-review). The exam and the project have been designed so that the success in TMC exercises should reflect well the expected grade.

Ilmoittautuminen ja opintomaksu

Free of charge.

Registration instructions

No preregistration. You will find registration instructions in the course material after you have completed the first 3 weeks of exercises.

Course materials and TMC assignments are available for everyone without officially enrolling on the course. These form the massive open online part of the course (MOOC). Peer-review of the project work and exam are available only for enrolled students, as these are used for grading the course towards 5 ECTS credits.

If you wish to have the credits entered in the University of Helsinki’s student records, you must register for the course thourgh the Open University at the latest 8.4.2021. You will find registration instructions in the course material after you have completed assignments in the first three parts of the TMC learning environment. The instructions are in the course material at the end of week 3. You will be able to access the TMC learning environment once the course starts. Observe also that there are deadlines for the project work and exam.

1. Study the course materials, sign up for the TMC learning environment, and complete the assignments as instructed.
2. After completing the first three parts of the assignments in the TMC learning environment, you will find instructions for the Open University course registration at the end of week 3.
3. After the Open University course registration, you gain access to Moodle, where the peer-review of the project work and exam take place.

Please note:

  • If you wish to have the ECTS entered in the University of Helsinki’s student records, you must register for the course at the Open University.
  • Students and international students at the University of Helsinki can enrol on the course with their University of Helsinki username.
  • If you do not have a Finnish personal identity code, please contact the University of Helsinki Admission Services in order to register for the course.

During your studies

Practical instructions for studying

Ar­range­ments for stu­dents in need of spe­cial sup­port

Open University reserves the right to make changes to the study programme.

Kuvaus

The course is available to students from other degree programmes and to non-degree students through Open University. All course material and exercises are open to anyone.

Programming skills and basic knowledge of probability calculus and linear algebra.

The compulsory basic level courses in Bachelor's Programme in Computer Science form a sufficient background.

Programming skills and basic knowledge of probability calculus and linear algebra.

The compulsory basic level courses in Bachelor's Programme in Computer Science form a sufficient background.

-Which other courses are recommended to be taken in addition to this course?

  • Introduction to Data Science

-Which other courses support the further development of the competence provided by this
course?

  • Introduction to Machine Learning
  • Biological Sequence Analysis
  • Can confidently write basic level Python programs without constantly consulting language/library documentation.
  • Can apply efficient and elegant Pythonic idioms to solve problems
  • Knows the different phases of data analysis pipeline
  • Knows the fundamental data types array, Series and DataFrame
  • Can clean data to form consistent Series and DataFrames without anomalies
  • Can select subsets, transform, reshape and combine data
  • Can extract summary statistics from data (min, max, mean, median, standard deviation)
  • Knows the main types of machine learning (supervised learning: regression and classification, unsupervised learning: clustering, dimensionality reduction, (density estimation))
  • Knows the estimator API of Scikit-Learn (choose model class, choose hyperparameters, form feature matrix and target vector, fit model, transform data or predict labels or responses)
  • Can form feature matrix and target vector suitable for Scikit-Learn's model fitting algorithms
  • Can visualize data as simple plots or histograms
  • Can apply basic data analysis skills to a simple project on an application field

The course uses practical approach to different phases of data analysis pipeline: data fetching and cleaning, reshaping, subsetting, grouping, and combining data; and using aggregation, machine learning and data visualization to extract knowledge from data.

  • Libraries: Numpy, Pandas, Scikit-learn, (Matplotlib)
  • Interactive study materials: Jupyter notebook
  • Automatic checking of exercises: Test My Code framework
  • Basics of Python language
  • Numpy
    • Creation and indexing of arrays
    • Array concatenation and splitting
    • Fast computation using universal functions
    • Summary statistics
    • Broadcasting
    • Matrix operations and basic linear algebra
  • Pandas
    • Creating and indexing of Series and DataFrames
    • Handling missing data
    • Concatenation of Series and DataFrames
    • Grouping and aggregating
    • Merging DataFrames
  • Gentle introduction to machine learning through Scikit-learn library
    • Linear regression
    • Naive Bayes classification
    • Principal component analysis
    • k-means clustering
  • Project on applying the learned skills on an application field

-What kind of literature and other materials are read during the course (reading list)?

Material is integrated to the MOOC instructions

-Which works are set reading and which are recommended as supplementary reading?

Jake VanderPlas, Python data science handbook, O'Reilly (2016)

The book is freely available in electronic form from https://jakevdp.github.io/PythonDataScienceHandbook/

MOOC includes automatic assessment of programming exercises

The grading scale is 1...5.

The final project, the peer-review work related to it and the exam are assessed.

Contact information:

The course is completed in three stages:
1. Study the course material and complete the assignments in the online learning environment (TMC),
2. peer-reviewed project work in Moodle, and
3. a multiple choice exam in Moodle.

1. The first part of course is completed in the TMC learning environment. The course website contains the material and instructions necessary for completing the course. The materials are available to everyone without enrolling for the course through the Open University.

2. The second part of the course consists of peer-reviewed project work in Moodle. To be able to access Moodle, you will need to enroll yourself on the course through the Open University. In order to enroll you will need to meet one of the following criteria:
A. You have a university of Helsinki user ID.
B. You have a user ID at a HAKA federation member institution.
C. You have Finnish personal indentity number.
D. You are able to visit the University of Helsinki Admission Services in Helsinki and verify your identity.

3. The last part of the course is a multiple choice exam in Moodle.

MOODLE

Online learning environment Moodle opens on June 6, 2020. You will be able to log in to the course's Moodle after completing the course assignments in the learning environment (TMC) and registering on the course.

How to get the Moodle link and course key?
You will get the Moodle link and the course key through email after you have completed the Open University course registration.

Veli Mäkinen

The course is part of the subject studies in Computer Science.