Open Data Science (DataCamp, R, RStudio, GitHub)

Open Data - Open Science - Data science

Our era of data - larger than ever and complex like chaos - requires several skills from statisticians and other data scientists.

We must discover the patterns hidden behind numbers in matrices and arrays. We are not afraid of coding, recoding, programming, or modelling. We want to visualize, analyze, interpret, understand, and communicate. These are the core themes of Open Data Science (Open Data - Open Science - Data Science). And this course is THE course for learning these skills.

General learning objective:
After completing this course you will understand the principles and advantages of using open research tools with open data and understand the possibilities of reproducible research. You will know how to use R, RStudio, RMarkdown, and GitHub for these tasks and also know how to learn more of these open software tools. You will also know how to apply certain statistical methods of data science, that is, data-driven statistics.

Enrol
23.9.2017 at 09:00 - 6.12.2017 at 23:59

Interaction

On this course, we use the learning platform for MOOCs (Massive Open Online Courses) of the University of Helsinki:
http://mooc.helsinki.fi

The course and the platform were opened on 31st October 2017 at 2pm.

From now on, everything happens on the MOOC platform (+ the other platforms) and in the weekly class (and elsewhere).

This "landing page" will not be needed nor updated anymore during 2017.

Timetable

There is about one week for completing each exercise (DataCamp exercises that are automatically graded and RStudio exercises that are graded using peer-review).

Deadlines are strict: late submissions cannot be accepted.

In addition of small introductory lectures, the classes are free-form workshops where the students work together and the teachers give advice when needed.

Video

Welcome to the course! (This "legendary" video was done - very quickly - for the very first IODS in January 2017...)

Contents

1 Tools and methods for open and reproducible research
R, RStudio, Rmarkdown, GitHub
2 Regression and model validation
3 Logistic regression
4 Clustering and classification
Discriminant analysis (DA)
K-means clustering (KMC)
5 Dimensionality reduction techniques
Principal component analysis (PCA)
Multiple Correspondence analysis (MCA)
6 Final assignment

Conduct of the course

Each week exercises are completed using DataCamp, RStudio, and GitHub. The course grade consists of

1) Points from DataCamp exercises (weekly)
2) Points from RStudio exercises (weekly)
3) Points from Final assignment.

DataCamp exercises are completed and automatically evaluated on the DataCamp learning platform. RStudio exercises are completed on your own computer, moved onto the web (GitHub) and then submitted and peer-reviewed in the weekly Workshop (=the name of the peer-review tool of the MOOC platform).

More details on the MOOC platform (WILL OPEN IN LATE OCTOBER 2017)

Feedback

We got fantastic, constructive and thoughtful feedback from the students of the 1st IODS both throughout and after the course. Still feeling humble and grateful! For the 2nd run we will keep the best things, make the good things even better, and open the course for a wider audience (MOOC).

Here are a few samples from the anonymous feedback given by post docs, PhD, Master's, Bachelor's and Exchange students from all over the University of Helsinki (the 1st run was done and piloted mainly within our own University, as we built the course at the same time, in a tight weekly schedule).

"I really enjoyed this course, to be honest this is the best course that I had in Helsinki. Combining both DataCamp and Rstudio exercise was amazing idea, it helped me alot. Even though I have been using R since couple of years but during this course I learned more sophisticated ways of programming."

"I have given my feedback during the course, hence now all I can say that thank you for the one of the best courses that I have taken in the university, and I have taken a lot of courses (> 300 credits).. THANK YOU! 5/5."

"First of all I want to thank you all about this course which has been the funniest and most interesting ever. This was my first touch to R, GitHub and Slack. I never thought that I would get this excited about something, but I did. I noticed that the R environment is an endless world and its not as difficult as I thought at first. I will definitely continue to learn codes and statistics."

"I think I did learn a lot and made a huge progress from 0 prior knowledge about coding and R and very shaky memory in statistics. I feel that now I can use what I’ve learnt to develop my skills further because I know how to look for information and have some idea how things work. I really liked the Data Camp part. It was super beneficial. Don’t think I would have learnt anything without it. So thanks a lot for an overall great course!"

"This has been an AMAZING course. I liked the content especially the linear regression, PCA and LD parts because I have been using these without really appreciating the whys completely. The hardest part for me has been interpreting the results and commenting on the graphs and figures - but now I have a better idea after looking at the selected samples you guys suggest after the submission."

"Now I feel that this was the best course, in which I have ever been participating, because: 1. The time schedule was very flexible, and I had an opportunity to work according my time schedule. 2. I like very much the interactive DataCamp exercises part, where I have got an idea how to start, and only after that to continue with the RStudio exercises. 3. The video lectures helped me a lot because this suits a lot to my way of studying: first to read and watch theoretical lectures, and after that continuing with the practical exercises part. 4. I like the idea for peer reviews, because in this way we had an opportunity to compare (to some extension) what we have done with the work of the others."

"Thank you for the course. It was amazing! I’m working on my second master’s degree (first one is from econ.) and I have to say that this was a best course ever. I liked specially the incredible way to combine new tools and massive amount of practice and work."

Description

Doctoral students of HYMY

See the video (02:28) at https://vimeo.com/195829801

The name of this course refers to THREE BIG THEMES: 1) Open Data, 2) Open Science, and 3) Data Science. These themes are summarized briefly as follows:

1) Open Data

There are more and more open data sets available. Utilizing and sharing data is an essential skill for researchers in all fields. During this course we use open data sets from different sources and learn to prepare them for different analyses. You will explore, analyse and interpret data from real world applications.

2) Open Science

Science thrives to be open. Repeating or reproducing the results is a common aim in any branch of science, but it is not always easy or simple. Sharing data is not enough for reproducibility. What is also needed, is using openly available software tools and methods as well as sharing your code and results. You will learn these skills during this course, using state-of-the-art tools.

3) Data Science

Data Science is the name for the data driven world of Statistics. Nowadays, finding or collecting data is not a problem. Instead, the challenges are in extracting knowledge and discovering the patterns behind the data. It requires skills of coding, programming, and modelling, as well as visualizing and analysing. You will face all these topics on this course.

We are quite excited about this course! So come along! Together we’ll guide you through these themes.

Welcome to the course! :)

Time: Tuesdays 31.10 – 12.12. /14:00 -16:00

Place: Unioninkatu 35, Aud 116

1 Tools and methods for open and reproducible research
R, RStudio, Rmarkdown, GitHub
2 Regression and model validation
3 Logistic regression
4 Clustering and classification
Discriminant analysis (DA)
K-means clustering (KMC)
5 Dimensionality reduction techniques
Principal component analysis (PCA)
Multiple Correspondence analysis (MCA)
6 Final assignment

Each week exercises are completed using DataCamp, RStudio, and GitHub. The course grade consists of

1) Points from DataCamp exercises (weekly)
2) Points from RStudio exercises (weekly)
3) Points from Final assignment.

DataCamp exercises are completed and automatically evaluated on the DataCamp learning platform. RStudio exercises are completed on your own computer, moved onto the web (GitHub) and then submitted and peer-reviewed in the weekly Workshop (=the name of the peer-review tool of the MOOC platform).

More details on the MOOC platform.

1-5

On this course, we use the learning platform for MOOCs (Massive Open Online Courses) of the University of Helsinki:
http://mooc.helsinki.fi