Quick Overview

Column 1

Outline

R is rapidly becoming the standard platform for data manipulation, visualization and analysis and has a number of advantages over other statistical software packages. A wide community of users contribute to R, resulting in an enormous coverage of statistical procedures, including many that are not available in any other statistical program. Furthermore, it is highly flexible for programming and scripting purposes, for example when manipulating data or creating professional plots. However, R lacks standard GUI menus, as in SPSS for example, from which to choose what statistical test to perform or which graph to create. As a consequence, R is more challenging to master. Therefore, this course offers an elaborate introduction into statistical programming in R. Students learn to operate R, make plots, fit, assess and interpret a variety of basic statistical models and do advanced statistical programming and data manipulation.

The course deals with the following topics:

  1. Basic to advanced programming skills: data generation, manipulation, pipelines, summaries and plotting.
  2. Fitting statistical models: estimation, prediction and testing.
  3. Basic statistical learning techniques.
  4. Crossvalidation.
  5. Clustering

The course starts at an intermediate level. At the end of the course, participants will master advanced programming and scripting skills with R. Please follow these exercises if you have doubts about your R skills.

Prerequisites: Participants are requested to bring their own laptop for lab meetings.

Column 2

Daily schedule

When? What?
09.30 09.45 Opening
09.45 10.30 Lecture A
10.30 11.15 Exercise A
11.15 11.30 Discussion A
break
11.45 12.45 Lecture B
break
13.30 14.15 Exercise B
14.15 14.30 Discussion B
break
14.45 15.30 Lecture C
15.30 16.00 Exercise C
16.00 16.15 Discussion C
16.15 16.30 Closing

Prepare your machine

Column 1

Preparing your machine for the course

Dear all,

This March you will participate in the Statistical programming with R workshop. To realize a steeper learning curve, we will use some functionality that is not part of the base installation for R. The below steps guide you through installing both R as well as the necessary additions. I look forward to see you all,

Gerko Vink

System requirements

Bring a laptop computer to the course and make sure that you have full write access and administrator rights to the machine. We will explore programming and compiling in this course. This means that you need full access to your machine. Some corporate laptops come with limited access for their users, we therefore advice you to bring a personal laptop computer, if you have one.

1. Install R

R can be obtained here. We won’t use R directly in the course, but rather call R through RStudio. Therefore it needs to be installed.

2. Install RStudio Desktop

Rstudio is an Integrated Development Environment (IDE). It can be obtained as stand-alone software here. The free and open source RStudio Desktop version is sufficient.

3. Start RStudio and install the following packages.

Execute the following lines of code in the console window:

install.packages(c("ggplot2", "tidyverse", "dplyr", "magrittr", "micemd", "jomo", "pan", 
                 "lme4", "knitr", "rmarkdown", "plotly", "ggplot2", "shiny", 
                 "devtools", "boot", "class", "car", "MASS", "ggplot2movies", 
                 "ISLR", "DAAG", "mice"), 
                 dependencies = TRUE)

If you are not sure where to execute code, use the following figure to identify the console:

HTML5 Icon

Just copy and paste the installation command and press the return key. When asked

Do you want to install from sources the package which needs 
compilation? (Yes/no/cancel)

type Yes in the console and press the return key.

Column 2

What if the steps to the left do not work for me?

If all fails and you have insufficient rights to your machine, the following web-based service will offer a solution.

  1. Open a free account on rstudio.cloud. You can run your own cloud-based RStudio environment there.

Naturally, you will need internet access for these services to be accessed.

Prepare yourself

Column 1

Prerequisites

Participants are assumed to be familiar with basic scripting in R. If you are in doubt about your skill set, please go over the materials in this preperatory course.

All the best,

Gerko Vink

Materials

Column 1

The course materials

I usually adapt the course as we go. To ensure that you work with the latest iteration of the course materials, I advice all course participants to access the materials online.

All lectures are in html format. Practicals are walkthrough files that guide you through the exercises. Impractical files contain the exercises without walkthrough, explanations and solutions.

Column 2

Useful References

The above links are useful references that connect to today’s materials. The above ggplot2 link details pretty much all you can do with package ggplot2.

To continue

Column 1

What to do after the course: Data Science

The above references are (currently) available for free in these links. I deem them very useful and I would highly recommend them.

Machine learning

See the following online books for some very detailed and hands-on guides on predictive modeling in R:

Inference in contemporary computing

The following book is an absolute must-read if valid inferences is one of your interests:

And this course may be useful