We use the following packages in this Practical:

library(dplyr)     # for data manipulation
library(magrittr)  # for pipes
library(ggplot2)   # for visualization
library(mice)      # for the boys data

Exercises


Pipes


  1. Use a pipe to do the following:
  • draw 1000 values from a normal distribution with mean = 5 and sd = 1 - \(N(5, 1)\),
  • create a matrix where the first 500 values are the first column and the second 500 values are the second column **
  • make a scatterplot of these two columns

  1. Use a pipe to calculate the correlation matrix on the anscombe data set

  1. Now use a pipe to calculate the correlation for the pair (x4, y4) on the anscombe data set

The boys dataset is part of package mice. It is a subset of 748 Dutch boystaken from the Fourth Dutch Growth Study. It’s columns record a variety of growth measures. Inspect the help for boys dataset and make yourself familiar with its contents.**

To learn more about the contents of the data, use one of the two following help commands:

help(boys)
?boys

  1. It seems that the boys data are sorted based on age. Verify this.

  1. Use a pipe to calculate the correlation between hgt and wgt in the boys data set from package mice.

  1. In the boys data set, hgt is recorded in centimeters. Use a pipe to transform hgt in the boys dataset to height in meters and verify the transformation

  1. Use a pipe to plot the pair (hgt, wgt) two times: once for hgt in meters and once for hgt in centimeters. Make the points in the ‘centimeter’ plot red and in the ‘meter’ plot blue.

Visualization


  1. Function plot() is the core plotting function in R. Find out more about plot(): Try both the help in the help-pane and ?plot in the console. Look at the examples by running example(plot).

  1. Create a scatterplot between age and bmi in the mice::boys data set

  1. Now recreate the plot with the following specifications:
  • If bmi < 18.5 use color = "light blue"
  • If bmi > 18.5 & bmi < 25 use color = "light green"
  • If bmi > 25 & bmi < 30 use color = "orange"
  • If bmi > 30 use color = "red"

Hint: it may help to expand the data set with a new variable.


  1. Create a histogram for age in the boys data set

  1. Create a bar chart for reg in the boys data set

  1. Create a box plot for hgt with different boxes for reg in the boys data set

  1. Create a density plot for age with different curves for boys from the city and boys from rural areas (!city).

  1. Create a diverging bar chart for hgt in the boys data set, that displays for every age year that year’s mean height in deviations from the overall average hgt

In other words; recreate the following plot:


End of Practical


Useful References