Open Data Science (DataCamp, R, RStudio, GitHub)

Open Data - Open Science - Data science

Our era of data - larger than ever and complex like chaos - requires several skills from statisticians and other data scientists.

We must discover the patterns hidden behind numbers in matrices and arrays. We are not afraid of coding, recoding, programming, or modelling. We want to visualize, analyze, interpret, understand, and communicate. These are the core themes of Open Data Science (Open Data - Open Science - Data Science). And this course is THE course for learning these skills.

General learning objective:
After completing this course you will understand the principles and advantages of using open research tools with open data and understand the possibilities of reproducible research. You will know how to use R, RStudio, RMarkdown, and GitHub for these tasks and also know how to learn more of these open software tools. You will also know how to apply certain statistical methods of data science, that is, data-driven statistics.

9.10.2018 at 00:01 - 13.11.2018 at 23:59


Kimmo Vehkalahti's picture

Kimmo Vehkalahti

Published, 23.10.2018 at 18:21

Dear all (over 130 doctoral, master's, bachelor, and exchange students + postdocs, professors, alumni et al from all over the four campuses of UH + some other places),

I have just opened the MOOC platform - one week before we actually begin.

PLEASE, use the HAKA-login to access the MOOC platform at:

and then enroll yourself on the course.

See you next Tuesday in the 1st class of this 1st class course - or virtually on the MOOC!

With best wishes,
- Kimmo (still working in 3333km distance from HEL; getting back next weekend) :)

PS. Do not not worry about WebOodi: it may complain that the course is fully booked...

Kimmo Vehkalahti's picture

Kimmo Vehkalahti

Published, 16.10.2018 at 17:03

16 Oct 2018 at 5 pm, in Helsinki, Finland:

In two weeks, we have begun the 1st class of this 1st class course - so welcome!

Before that, I would ask you to read my THINK OPEN blog, published yesterday:

Please check the advice about prerequisites. Thanks!

See you all soon,
- Kimmo


On this course, we use the learning platform for MOOCs (Massive Open Online Courses) of the University of Helsinki:


There is about one week for completing each exercise (DataCamp exercises that are not graded and RStudio exercises that are graded using peer-review).

Deadlines are strict: late submissions cannot be accepted.

In addition of small introductory lectures, the classes are free-form workshops where the students work together and the teachers give advice when needed.

Tue 30.10.2018
16:00 - 18:00
Tue 6.11.2018
16:00 - 18:00
Tue 13.11.2018
16:00 - 18:00
Tue 20.11.2018
16:00 - 18:00
Tue 27.11.2018
16:00 - 18:00
Tue 4.12.2018
16:00 - 18:00
Tue 11.12.2018
16:00 - 18:00


Welcome to the course! (This "legendary" video was done - very quickly - for the very first IODS in January 2017...)


1. Tools and methods for open and reproducible research:
R, RStudio, Rmarkdown, GitHub
2. Regression and model validation:
Data wrangling
Simple regression
Multiple regression
Regression diagnostics
3. Logistic regression:
Regression for binary outcomes
Training and testing a (predictive) model
4. Clustering and classification:
Datasets in R
Discriminant analysis (DA)
K-means clustering (KMC)
5. Dimensionality reduction techniques:
Principal component analysis (PCA)
Multiple Correspondence analysis (MCA)
6. Analysis of Longitudinal Data
Graphical Displays and Summary Measures
Linear Mixed Effects Models

Conduct of the course

Each week exercises are completed using DataCamp, RStudio, and GitHub. The course grade consists of the points from RStudio exercises (weekly).

DataCamp exercises are completed on the DataCamp learning platform. RStudio exercises are completed on your own computer, moved onto the web (GitHub) and then submitted and peer-reviewed in the weekly Workshop (=the name of the peer-review tool of the MOOC platform).

More details on the MOOC platform (WILL OPEN IN LATE OCTOBER)


We got fantastic, constructive and thoughtful feedback from the students of the 1st IODS both throughout and after the course. Still feeling humble and grateful! Since then, we have tried to keep the best things and made the good things even better.

Here are a few samples from the anonymous feedback given by post docs, PhD, Master's, Bachelor's and Exchange students from all over the University of Helsinki (the 1st run was done and piloted mainly within our own University, as we built the course at the same time, in a tight weekly schedule).

"I really enjoyed this course, to be honest this is the best course that I had in Helsinki. Combining both DataCamp and Rstudio exercise was amazing idea, it helped me alot. Even though I have been using R since couple of years but during this course I learned more sophisticated ways of programming."

"I have given my feedback during the course, hence now all I can say that thank you for the one of the best courses that I have taken in the university, and I have taken a lot of courses (> 300 credits).. THANK YOU! 5/5."

"First of all I want to thank you all about this course which has been the funniest and most interesting ever. This was my first touch to R, GitHub and Slack. I never thought that I would get this excited about something, but I did. I noticed that the R environment is an endless world and its not as difficult as I thought at first. I will definitely continue to learn codes and statistics."

"I think I did learn a lot and made a huge progress from 0 prior knowledge about coding and R and very shaky memory in statistics. I feel that now I can use what I’ve learnt to develop my skills further because I know how to look for information and have some idea how things work. I really liked the Data Camp part. It was super beneficial. Don’t think I would have learnt anything without it. So thanks a lot for an overall great course!"

"This has been an AMAZING course. I liked the content especially the linear regression, PCA and LD parts because I have been using these without really appreciating the whys completely. The hardest part for me has been interpreting the results and commenting on the graphs and figures - but now I have a better idea after looking at the selected samples you guys suggest after the submission."

"Now I feel that this was the best course, in which I have ever been participating, because: 1. The time schedule was very flexible, and I had an opportunity to work according my time schedule. 2. I like very much the interactive DataCamp exercises part, where I have got an idea how to start, and only after that to continue with the RStudio exercises. 3. The video lectures helped me a lot because this suits a lot to my way of studying: first to read and watch theoretical lectures, and after that continuing with the practical exercises part. 4. I like the idea for peer reviews, because in this way we had an opportunity to compare (to some extension) what we have done with the work of the others."

"Thank you for the course. It was amazing! I’m working on my second master’s degree (first one is from econ.) and I have to say that this was a best course ever. I liked specially the incredible way to combine new tools and massive amount of practice and work."


Doctoral students of HYMY

See the video (02:28) at

The name of this course refers to THREE BIG THEMES: 1) Open Data, 2) Open Science, and 3) Data Science. These themes are summarized briefly as follows:

1) Open Data

There are more and more open data sets available. Utilizing and sharing data is an essential skill for researchers in all fields. During this course we use open data sets from different sources and learn to prepare them for different analyses. You will explore, analyse and interpret data from real world applications.

2) Open Science

Science thrives to be open. Repeating or reproducing the results is a common aim in any branch of science, but it is not always easy or simple. Sharing data is not enough for reproducibility. What is also needed, is using openly available software tools and methods as well as sharing your code and results. You will learn these skills during this course, using state-of-the-art tools.

3) Data Science

Data Science is the name for the data driven world of Statistics. Nowadays, finding or collecting data is not a problem. Instead, the challenges are in extracting knowledge and discovering the patterns behind the data. It requires skills of coding, programming, and modelling, as well as visualizing and analysing. You will face all these topics on this course.

We are quite excited about this course! So come along! Together we’ll guide you through these themes.

Welcome to the course! :)

1 Tools and methods for open and reproducible research
R, RStudio, Rmarkdown, GitHub
2 Regression and model validation
3 Logistic regression
4 Clustering and classification
Discriminant analysis (DA)
K-means clustering (KMC)
5 Dimensionality reduction techniques
Principal component analysis (PCA)
Multiple Correspondence analysis (MCA)
6 Analysis of Longitudinal Data
Graphical Displays and Summary Measures
Linear Mixed Effects Models


On this course, we use the learning platform for MOOCs (Massive Open Online Courses) of the University of Helsinki: