Opetus

Nimi Op Opiskelumuoto Aika Paikkakunta Järjestäjä
Data Analysis with Python, MOOC, summer 2019 Aloita joustavasti 5 Cr Kurssi 19.6.2019 - 31.12.2019 Helsinki Avoin yliopisto
Data Analysis with Python, MOOC, spring 2020 5 Cr Kurssi 9.3.2020 - 11.5.2020 Helsinki Avoin yliopisto

Kohderyhmä

Master's Programme in Computer Science is responsible for the course.

The course is

  • an optional course in other studies module of Master's Programme in Computer Science
  • an optional course in the new Bachelor's Programme in Science

The course is available to students from other degree programmes.

Edeltävät opinnot tai edeltävä osaaminen

Programming skills and basic knowledge of probability calculus and linear algebra.

The compulsory basic level courses in Bachelor's Programme in Computer Science form a sufficient background.

Osaamistavoitteet

  • Can confidently write basic level Python programs without constantly consulting language/library documentation.
  • Can apply efficient and elegant Pythonic idioms to solve problems
  • Knows the different phases of data analysis pipeline
  • Knows the fundamental data types array, Series and DataFrame
  • Can clean data to form consistent Series and DataFrames without anomalies
  • Can select subsets, transform, reshape and combine data
  • Can extract summary statistics from data (min, max, mean, median, standard deviation)
  • Knows the main types of machine learning (supervised learning: regression and classification, unsupervised learning: clustering, dimensionality reduction, (density estimation))
  • Knows the estimator API of Scikit-Learn (choose model class, choose hyperparameters, form feature matrix and target vector, fit model, transform data or predict labels or responses)
  • Can form feature matrix and target vector suitable for Scikit-Learn's model fitting algorithms
  • Can visualize data as simple plots or histograms
  • Can apply basic data analysis skills to a simple project on an application field

Sisältö

The course uses practical approach to different phases of data analysis pipeline: data fetching and cleaning, reshaping, subsetting, grouping, and combining data; and using aggregation, machine learning and data visualization to extract knowledge from data.

  • Libraries: Numpy, Pandas, Scikit-learn, (Matplotlib)
  • Interactive study materials: Jupyter notebook
  • Automatic checking of exercises: Test My Code framework
  • Basics of Python language
  • Numpy
    • Creation and indexing of arrays
    • Array concatenation and splitting
    • Fast computation using universal functions
    • Summary statistics
    • Broadcasting
    • Matrix operations and basic linear algebra
  • Pandas
    • Creating and indexing of Series and DataFrames
    • Handling missing data
    • Concatenation of Series and DataFrames
    • Grouping and aggregating
    • Merging DataFrames
  • Gentle introduction to machine learning through Scikit-learn library
    • Linear regression
    • Naive Bayes classification
    • Principal component analysis
    • k-means clustering
  • Project on applying the learned skills on an application field

Oppimista tukevat aktiviteetit ja opetusmenetelmät

MOOC includes automatic assessment of programming exercises

Oppimateriaali

-What kind of literature and other materials are read during the course (reading list)?

Material is integrated to the MOOC instructions

-Which works are set reading and which are recommended as supplementary reading?

Jake VanderPlas, Python data science handbook, O'Reilly (2016)

The book is freely available in electronic form from https://jakevdp.github.io/PythonDataScienceHandbook/

Arviointimenetelmät ja -kriteerit

The grading scale is 1...5.

The final project, the peer-review work related to it and the exam are assessed.

Suositeltavat valinnaiset opinnot

-Which other courses are recommended to be taken in addition to this course?

  • Introduction to Data Science

-Which other courses support the further development of the competence provided by this
course?

  • Introduction to Machine Learning
  • Biological Sequence Analysis

Yhteydet muihin opintojaksoihin

The course is part of the subject studies in Computer Science.

Toteutus

Course is offered as MOOC. After passing 80 % of the course material through online exercises, one can enter to the project work and the exam. The project work is returned to Moodle, peer-reviewed, and then evaluated. The exam consists of multiple choice questions.