Data Science and Predictive Machine Learning

Packages used in this lecture

library(magrittr) # pipes
library(dplyr)    # data manipulation
library(caret)    # flexible machine learning
library(DT)       # interactive tables
library(readr)    # reading in csv
library(rpart)    # trees
library(rpart.plot) # plotting trees
library(pROC)     # ROC curves
set.seed(123)     # for reproducibility
con <- url("https://www.gerkovink.com/erasmus/Day%202/Part%20D/titanic.csv")
titanic <- read_csv(con, show_col_types = FALSE)

So far

We have learned the following techniques

  • linear regression
  • logistic regression
  • ridge regression
  • lasso regression
  • the elastic net
  • support vector machines

Trees

Decision trees

Decision trees form a collection of learning algorithms that can be applied to regression and classification problems.

  • The algorithms are called trees because it follows a tree-like structure

A decision tree is a series of nodes, a directional graph that starts at the base with a single node and extends to the many leaf nodes that represent the categories or subdivisions of the parameter space that the tree can classify.

Another way to think of a decision tree is as a flow chart, where the flow starts at the root node and ends with a decision made at the leaves.

Splits are based on the predictor space. When a split is ignored or discarded, we call it pruning.

All nodes other than the root node have parent nodes.

Nodes explained

Classification and regression trees

Popularity

Decision trees are very popular. This popularity is due to:

Exploratory benefits: through decision trees we can identify important features and important interactions between features.

  • This may allow to separate the signal from the noise.

Explanatory power: it is often straigthforward to interpret decision trees when you have limited statistical expertise.

Robust: a decision tree is a non-parametric algorithm. It therefore does not pose any restrictions and/or assumptions on the parameter space and the relation between the predictors and the response.

Generic: a decision tree can make classifications no matter the measurement level of the predictors.

time-saving: decision trees require less data editing than other methods, because they are insensitive to skewness, outliers, unobserved cells, etc.

  • trees do not solve the missingness in your data, they simply grow around it.

However

Classification and regression trees also have disadvantages:

Overfitting: Fitting a tree to the degree where it is tailored to your data may yield low training bias, but high test error.

  • putting constraints on the depth (or height) of the tree may help
  • pruning is another way to regularize a tree. By not using every branch, more bias is induced.

Discretization: For continuous responses, trees may be less efficient.P

  • predictions must be separated into discrete categories, which results in a loss of information when applying the model to continuous values.

Feature engineering: Trees require the predictor space to be well-selected and defined.

That said

Fitting a tree

titanic %<>% 
  mutate(Pclass = factor(Pclass, labels = c("1st class", "2nd class", "3rd class")), 
         Survived = factor(Survived, labels = c("Died", "Survived")))
idx <- createDataPartition(titanic$Survived, p = .8, list = FALSE)
train <- titanic[idx, ]
test <- titanic[-idx, ]
tree <- rpart(Survived ~ Age + Pclass + Sex, data = train, method = "class")

inspecting a tree

rpart.plot(tree)

Pruning a tree?

prune(tree, cp = .02) %>% rpart.plot()

In order to prune a tree we need to select a complexity parameter cp: the minimum improvement in the model needed at each node.

Choosing cp

tree <- train(Survived ~ Age + Pclass + Sex, 
              data = titanic, 
              method = "rpart",
              tuneGrid = expand.grid(cp = seq(0.001, 1, by = .001)))

Complexity parameter

Too low cp will overfit. Too high cp will underfit.

plot(tree)

tree$bestTune
##      cp
## 3 0.003

Output

tree
## CART 
## 
## 887 samples
##   3 predictor
##   2 classes: 'Died', 'Survived' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 887, 887, 887, 887, 887, 887, ... 
## Resampling results across tuning parameters:
## 
##   cp     Accuracy   Kappa     
##   0.001  0.8006942  0.57482517
##   0.002  0.8027752  0.57825161
##   0.003  0.8042517  0.58167615
##   0.004  0.8030252  0.57835436
##   0.005  0.8041201  0.57914057
##   0.006  0.8019767  0.57500754
##   0.007  0.8024698  0.57600267
##   0.008  0.8010794  0.57270192
##   0.009  0.8014590  0.57460632
##   0.010  0.7985172  0.56723452
##   0.011  0.7973958  0.56566826
##   0.012  0.7977081  0.56554791
##   0.013  0.7986506  0.56730354
##   0.014  0.7988818  0.56778306
##   0.015  0.7986360  0.56723332
##   0.016  0.7957375  0.56186247
##   0.017  0.7948812  0.56024727
##   0.018  0.7931251  0.55444588
##   0.019  0.7925245  0.55376876
##   0.020  0.7922828  0.55304196
##   0.021  0.7916733  0.55149353
##   0.022  0.7893331  0.54618212
##   0.023  0.7890950  0.54578567
##   0.024  0.7882145  0.54355802
##   0.025  0.7882145  0.54355802
##   0.026  0.7865984  0.53933222
##   0.027  0.7845241  0.53614664
##   0.028  0.7837085  0.53524215
##   0.029  0.7837085  0.53524215
##   0.030  0.7819629  0.53225560
##   0.031  0.7819629  0.53225560
##   0.032  0.7829650  0.53525305
##   0.033  0.7837367  0.53831174
##   0.034  0.7836162  0.53757177
##   0.035  0.7831142  0.53554841
##   0.036  0.7823850  0.53335408
##   0.037  0.7823913  0.53236475
##   0.038  0.7828874  0.53488429
##   0.039  0.7828874  0.53488429
##   0.040  0.7828874  0.53488429
##   0.041  0.7830132  0.53617921
##   0.042  0.7830132  0.53617921
##   0.043  0.7830132  0.53617921
##   0.044  0.7830132  0.53617921
##   0.045  0.7830132  0.53617921
##   0.046  0.7830132  0.53600948
##   0.047  0.7830132  0.53576774
##   0.048  0.7830132  0.53526685
##   0.049  0.7830132  0.53526685
##   0.050  0.7830132  0.53526685
##   0.051  0.7830132  0.53526685
##   0.052  0.7830132  0.53526685
##   0.053  0.7830132  0.53526685
##   0.054  0.7830132  0.53526685
##   0.055  0.7830132  0.53526685
##   0.056  0.7830132  0.53526685
##   0.057  0.7830132  0.53526685
##   0.058  0.7830132  0.53526685
##   0.059  0.7830132  0.53526685
##   0.060  0.7830132  0.53526685
##   0.061  0.7830132  0.53526685
##   0.062  0.7830132  0.53526685
##   0.063  0.7844291  0.53942780
##   0.064  0.7844291  0.53942780
##   0.065  0.7844291  0.53942780
##   0.066  0.7844291  0.53942780
##   0.067  0.7844291  0.53942780
##   0.068  0.7844291  0.53942780
##   0.069  0.7844291  0.53942780
##   0.070  0.7844291  0.53942780
##   0.071  0.7844291  0.53942780
##   0.072  0.7844291  0.53942780
##   0.073  0.7844291  0.53942780
##   0.074  0.7844291  0.53942780
##   0.075  0.7844291  0.53942780
##   0.076  0.7844291  0.53942780
##   0.077  0.7844291  0.53942780
##   0.078  0.7844291  0.53942780
##   0.079  0.7844291  0.53942780
##   0.080  0.7844291  0.53942780
##   0.081  0.7844291  0.53942780
##   0.082  0.7844291  0.53942780
##   0.083  0.7844291  0.53942780
##   0.084  0.7844291  0.53942780
##   0.085  0.7844291  0.53942780
##   0.086  0.7844291  0.53942780
##   0.087  0.7844291  0.53942780
##   0.088  0.7844291  0.53942780
##   0.089  0.7844291  0.53942780
##   0.090  0.7844291  0.53942780
##   0.091  0.7844291  0.53942780
##   0.092  0.7844291  0.53942780
##   0.093  0.7844291  0.53942780
##   0.094  0.7844291  0.53942780
##   0.095  0.7844291  0.53942780
##   0.096  0.7844291  0.53942780
##   0.097  0.7844291  0.53942780
##   0.098  0.7844291  0.53942780
##   0.099  0.7844291  0.53942780
##   0.100  0.7844291  0.53942780
##   0.101  0.7844291  0.53942780
##   0.102  0.7844291  0.53942780
##   0.103  0.7844291  0.53942780
##   0.104  0.7844291  0.53942780
##   0.105  0.7844291  0.53942780
##   0.106  0.7844291  0.53942780
##   0.107  0.7844291  0.53942780
##   0.108  0.7844291  0.53942780
##   0.109  0.7844291  0.53942780
##   0.110  0.7844291  0.53942780
##   0.111  0.7844291  0.53942780
##   0.112  0.7844291  0.53942780
##   0.113  0.7844291  0.53942780
##   0.114  0.7844291  0.53942780
##   0.115  0.7844291  0.53942780
##   0.116  0.7844291  0.53942780
##   0.117  0.7844291  0.53942780
##   0.118  0.7844291  0.53942780
##   0.119  0.7844291  0.53942780
##   0.120  0.7844291  0.53942780
##   0.121  0.7844291  0.53942780
##   0.122  0.7844291  0.53942780
##   0.123  0.7844291  0.53942780
##   0.124  0.7844291  0.53942780
##   0.125  0.7844291  0.53942780
##   0.126  0.7844291  0.53942780
##   0.127  0.7844291  0.53942780
##   0.128  0.7844291  0.53942780
##   0.129  0.7844291  0.53942780
##   0.130  0.7844291  0.53942780
##   0.131  0.7844291  0.53942780
##   0.132  0.7844291  0.53942780
##   0.133  0.7844291  0.53942780
##   0.134  0.7844291  0.53942780
##   0.135  0.7844291  0.53942780
##   0.136  0.7844291  0.53942780
##   0.137  0.7844291  0.53942780
##   0.138  0.7844291  0.53942780
##   0.139  0.7844291  0.53942780
##   0.140  0.7844291  0.53942780
##   0.141  0.7844291  0.53942780
##   0.142  0.7844291  0.53942780
##   0.143  0.7844291  0.53942780
##   0.144  0.7844291  0.53942780
##   0.145  0.7844291  0.53942780
##   0.146  0.7844291  0.53942780
##   0.147  0.7844291  0.53942780
##   0.148  0.7844291  0.53942780
##   0.149  0.7844291  0.53942780
##   0.150  0.7844291  0.53942780
##   0.151  0.7844291  0.53942780
##   0.152  0.7844291  0.53942780
##   0.153  0.7844291  0.53942780
##   0.154  0.7844291  0.53942780
##   0.155  0.7844291  0.53942780
##   0.156  0.7844291  0.53942780
##   0.157  0.7844291  0.53942780
##   0.158  0.7844291  0.53942780
##   0.159  0.7844291  0.53942780
##   0.160  0.7844291  0.53942780
##   0.161  0.7844291  0.53942780
##   0.162  0.7844291  0.53942780
##   0.163  0.7844291  0.53942780
##   0.164  0.7844291  0.53942780
##   0.165  0.7844291  0.53942780
##   0.166  0.7844291  0.53942780
##   0.167  0.7844291  0.53942780
##   0.168  0.7844291  0.53942780
##   0.169  0.7844291  0.53942780
##   0.170  0.7844291  0.53942780
##   0.171  0.7844291  0.53942780
##   0.172  0.7844291  0.53942780
##   0.173  0.7844291  0.53942780
##   0.174  0.7844291  0.53942780
##   0.175  0.7844291  0.53942780
##   0.176  0.7844291  0.53942780
##   0.177  0.7844291  0.53942780
##   0.178  0.7844291  0.53942780
##   0.179  0.7844291  0.53942780
##   0.180  0.7844291  0.53942780
##   0.181  0.7844291  0.53942780
##   0.182  0.7844291  0.53942780
##   0.183  0.7844291  0.53942780
##   0.184  0.7844291  0.53942780
##   0.185  0.7844291  0.53942780
##   0.186  0.7844291  0.53942780
##   0.187  0.7844291  0.53942780
##   0.188  0.7844291  0.53942780
##   0.189  0.7844291  0.53942780
##   0.190  0.7844291  0.53942780
##   0.191  0.7844291  0.53942780
##   0.192  0.7844291  0.53942780
##   0.193  0.7844291  0.53942780
##   0.194  0.7844291  0.53942780
##   0.195  0.7844291  0.53942780
##   0.196  0.7844291  0.53942780
##   0.197  0.7844291  0.53942780
##   0.198  0.7844291  0.53942780
##   0.199  0.7844291  0.53942780
##   0.200  0.7844291  0.53942780
##   0.201  0.7844291  0.53942780
##   0.202  0.7844291  0.53942780
##   0.203  0.7844291  0.53942780
##   0.204  0.7844291  0.53942780
##   0.205  0.7844291  0.53942780
##   0.206  0.7844291  0.53942780
##   0.207  0.7844291  0.53942780
##   0.208  0.7844291  0.53942780
##   0.209  0.7844291  0.53942780
##   0.210  0.7844291  0.53942780
##   0.211  0.7844291  0.53942780
##   0.212  0.7844291  0.53942780
##   0.213  0.7844291  0.53942780
##   0.214  0.7844291  0.53942780
##   0.215  0.7844291  0.53942780
##   0.216  0.7844291  0.53942780
##   0.217  0.7844291  0.53942780
##   0.218  0.7844291  0.53942780
##   0.219  0.7844291  0.53942780
##   0.220  0.7844291  0.53942780
##   0.221  0.7844291  0.53942780
##   0.222  0.7844291  0.53942780
##   0.223  0.7844291  0.53942780
##   0.224  0.7844291  0.53942780
##   0.225  0.7844291  0.53942780
##   0.226  0.7844291  0.53942780
##   0.227  0.7844291  0.53942780
##   0.228  0.7844291  0.53942780
##   0.229  0.7844291  0.53942780
##   0.230  0.7844291  0.53942780
##   0.231  0.7844291  0.53942780
##   0.232  0.7844291  0.53942780
##   0.233  0.7844291  0.53942780
##   0.234  0.7844291  0.53942780
##   0.235  0.7844291  0.53942780
##   0.236  0.7844291  0.53942780
##   0.237  0.7844291  0.53942780
##   0.238  0.7844291  0.53942780
##   0.239  0.7844291  0.53942780
##   0.240  0.7844291  0.53942780
##   0.241  0.7844291  0.53942780
##   0.242  0.7844291  0.53942780
##   0.243  0.7844291  0.53942780
##   0.244  0.7844291  0.53942780
##   0.245  0.7844291  0.53942780
##   0.246  0.7844291  0.53942780
##   0.247  0.7844291  0.53942780
##   0.248  0.7844291  0.53942780
##   0.249  0.7844291  0.53942780
##   0.250  0.7844291  0.53942780
##   0.251  0.7844291  0.53942780
##   0.252  0.7844291  0.53942780
##   0.253  0.7844291  0.53942780
##   0.254  0.7844291  0.53942780
##   0.255  0.7844291  0.53942780
##   0.256  0.7844291  0.53942780
##   0.257  0.7844291  0.53942780
##   0.258  0.7844291  0.53942780
##   0.259  0.7844291  0.53942780
##   0.260  0.7844291  0.53942780
##   0.261  0.7844291  0.53942780
##   0.262  0.7844291  0.53942780
##   0.263  0.7844291  0.53942780
##   0.264  0.7844291  0.53942780
##   0.265  0.7844291  0.53942780
##   0.266  0.7844291  0.53942780
##   0.267  0.7844291  0.53942780
##   0.268  0.7844291  0.53942780
##   0.269  0.7844291  0.53942780
##   0.270  0.7844291  0.53942780
##   0.271  0.7844291  0.53942780
##   0.272  0.7844291  0.53942780
##   0.273  0.7844291  0.53942780
##   0.274  0.7844291  0.53942780
##   0.275  0.7844291  0.53942780
##   0.276  0.7844291  0.53942780
##   0.277  0.7844291  0.53942780
##   0.278  0.7844291  0.53942780
##   0.279  0.7844291  0.53942780
##   0.280  0.7844291  0.53942780
##   0.281  0.7844291  0.53942780
##   0.282  0.7844291  0.53942780
##   0.283  0.7844291  0.53942780
##   0.284  0.7844291  0.53942780
##   0.285  0.7844291  0.53942780
##   0.286  0.7844291  0.53942780
##   0.287  0.7844291  0.53942780
##   0.288  0.7844291  0.53942780
##   0.289  0.7844291  0.53942780
##   0.290  0.7844291  0.53942780
##   0.291  0.7844291  0.53942780
##   0.292  0.7844291  0.53942780
##   0.293  0.7844291  0.53942780
##   0.294  0.7844291  0.53942780
##   0.295  0.7844291  0.53942780
##   0.296  0.7844291  0.53942780
##   0.297  0.7844291  0.53942780
##   0.298  0.7844291  0.53942780
##   0.299  0.7844291  0.53942780
##   0.300  0.7844291  0.53942780
##   0.301  0.7844291  0.53942780
##   0.302  0.7844291  0.53942780
##   0.303  0.7844291  0.53942780
##   0.304  0.7844291  0.53942780
##   0.305  0.7844291  0.53942780
##   0.306  0.7844291  0.53942780
##   0.307  0.7844291  0.53942780
##   0.308  0.7844291  0.53942780
##   0.309  0.7844291  0.53942780
##   0.310  0.7844291  0.53942780
##   0.311  0.7844291  0.53942780
##   0.312  0.7844291  0.53942780
##   0.313  0.7844291  0.53942780
##   0.314  0.7844291  0.53942780
##   0.315  0.7844291  0.53942780
##   0.316  0.7844291  0.53942780
##   0.317  0.7844291  0.53942780
##   0.318  0.7844291  0.53942780
##   0.319  0.7844291  0.53942780
##   0.320  0.7844291  0.53942780
##   0.321  0.7844291  0.53942780
##   0.322  0.7844291  0.53942780
##   0.323  0.7844291  0.53942780
##   0.324  0.7844291  0.53942780
##   0.325  0.7844291  0.53942780
##   0.326  0.7844291  0.53942780
##   0.327  0.7844291  0.53942780
##   0.328  0.7844291  0.53942780
##   0.329  0.7844291  0.53942780
##   0.330  0.7844291  0.53942780
##   0.331  0.7844291  0.53942780
##   0.332  0.7844291  0.53942780
##   0.333  0.7844291  0.53942780
##   0.334  0.7844291  0.53942780
##   0.335  0.7844291  0.53942780
##   0.336  0.7844291  0.53942780
##   0.337  0.7844291  0.53942780
##   0.338  0.7844291  0.53942780
##   0.339  0.7844291  0.53942780
##   0.340  0.7844291  0.53942780
##   0.341  0.7844291  0.53942780
##   0.342  0.7844291  0.53942780
##   0.343  0.7844291  0.53942780
##   0.344  0.7844291  0.53942780
##   0.345  0.7844291  0.53942780
##   0.346  0.7844291  0.53942780
##   0.347  0.7844291  0.53942780
##   0.348  0.7844291  0.53942780
##   0.349  0.7844291  0.53942780
##   0.350  0.7844291  0.53942780
##   0.351  0.7844291  0.53942780
##   0.352  0.7844291  0.53942780
##   0.353  0.7844291  0.53942780
##   0.354  0.7844291  0.53942780
##   0.355  0.7844291  0.53942780
##   0.356  0.7844291  0.53942780
##   0.357  0.7844291  0.53942780
##   0.358  0.7844291  0.53942780
##   0.359  0.7844291  0.53942780
##   0.360  0.7844291  0.53942780
##   0.361  0.7844291  0.53942780
##   0.362  0.7844291  0.53942780
##   0.363  0.7844291  0.53942780
##   0.364  0.7844291  0.53942780
##   0.365  0.7844291  0.53942780
##   0.366  0.7844291  0.53942780
##   0.367  0.7844291  0.53942780
##   0.368  0.7844291  0.53942780
##   0.369  0.7844291  0.53942780
##   0.370  0.7844291  0.53942780
##   0.371  0.7844291  0.53942780
##   0.372  0.7844291  0.53942780
##   0.373  0.7844291  0.53942780
##   0.374  0.7771452  0.51594322
##   0.375  0.7771452  0.51594322
##   0.376  0.7771452  0.51594322
##   0.377  0.7771452  0.51594322
##   0.378  0.7771452  0.51594322
##   0.379  0.7771452  0.51594322
##   0.380  0.7771452  0.51594322
##   0.381  0.7771452  0.51594322
##   0.382  0.7771452  0.51594322
##   0.383  0.7683647  0.49246011
##   0.384  0.7683647  0.49246011
##   0.385  0.7683647  0.49246011
##   0.386  0.7683647  0.49246011
##   0.387  0.7683647  0.49246011
##   0.388  0.7683647  0.49246011
##   0.389  0.7683647  0.49246011
##   0.390  0.7683647  0.49246011
##   0.391  0.7610252  0.47080005
##   0.392  0.7610252  0.47080005
##   0.393  0.7610252  0.47080005
##   0.394  0.7610252  0.47080005
##   0.395  0.7610252  0.47080005
##   0.396  0.7610252  0.47080005
##   0.397  0.7610252  0.47080005
##   0.398  0.7533082  0.44636186
##   0.399  0.7533082  0.44636186
##   0.400  0.7533082  0.44636186
##   0.401  0.7533082  0.44636186
##   0.402  0.7533082  0.44636186
##   0.403  0.7533082  0.44636186
##   0.404  0.7533082  0.44636186
##   0.405  0.7533082  0.44636186
##   0.406  0.7453837  0.42407038
##   0.407  0.7453837  0.42407038
##   0.408  0.7453837  0.42407038
##   0.409  0.7453837  0.42407038
##   0.410  0.7375788  0.40175577
##   0.411  0.7375788  0.40175577
##   0.412  0.7293192  0.37919725
##   0.413  0.7293192  0.37919725
##   0.414  0.7293192  0.37919725
##   0.415  0.7293192  0.37919725
##   0.416  0.7293192  0.37919725
##   0.417  0.7293192  0.37919725
##   0.418  0.7293192  0.37919725
##   0.419  0.7293192  0.37919725
##   0.420  0.7293192  0.37919725
##   0.421  0.7293192  0.37919725
##   0.422  0.7293192  0.37919725
##   0.423  0.7293192  0.37919725
##   0.424  0.7293192  0.37919725
##   0.425  0.7226525  0.35693638
##   0.426  0.7226525  0.35693638
##   0.427  0.7226525  0.35693638
##   0.428  0.7226525  0.35693638
##   0.429  0.7138475  0.33434803
##   0.430  0.7138475  0.33434803
##   0.431  0.7138475  0.33434803
##   0.432  0.7071161  0.31214602
##   0.433  0.7071161  0.31214602
##   0.434  0.7071161  0.31214602
##   0.435  0.7071161  0.31214602
##   0.436  0.7071161  0.31214602
##   0.437  0.7071161  0.31214602
##   0.438  0.7071161  0.31214602
##   0.439  0.7003449  0.29071561
##   0.440  0.7003449  0.29071561
##   0.441  0.6936783  0.26899856
##   0.442  0.6936783  0.26899856
##   0.443  0.6936783  0.26899856
##   0.444  0.6936783  0.26899856
##   0.445  0.6785302  0.22575209
##   0.446  0.6721639  0.20401425
##   0.447  0.6721639  0.20401425
##   0.448  0.6721639  0.20401425
##   0.449  0.6721639  0.20401425
##   0.450  0.6721639  0.20401425
##   0.451  0.6721639  0.20401425
##   0.452  0.6721639  0.20401425
##   0.453  0.6721639  0.20401425
##   0.454  0.6721639  0.20401425
##   0.455  0.6721639  0.20401425
##   0.456  0.6645510  0.18222796
##   0.457  0.6645510  0.18222796
##   0.458  0.6645510  0.18222796
##   0.459  0.6576627  0.15982058
##   0.460  0.6576627  0.15982058
##   0.461  0.6576627  0.15982058
##   0.462  0.6513213  0.13940575
##   0.463  0.6513213  0.13940575
##   0.464  0.6450121  0.11847665
##   0.465  0.6450121  0.11847665
##   0.466  0.6450121  0.11847665
##   0.467  0.6450121  0.11847665
##   0.468  0.6450121  0.11847665
##   0.469  0.6450121  0.11847665
##   0.470  0.6450121  0.11847665
##   0.471  0.6450121  0.11847665
##   0.472  0.6450121  0.11847665
##   0.473  0.6450121  0.11847665
##   0.474  0.6450121  0.11847665
##   0.475  0.6450121  0.11847665
##   0.476  0.6450121  0.11847665
##   0.477  0.6386513  0.09826564
##   0.478  0.6386513  0.09826564
##   0.479  0.6386513  0.09826564
##   0.480  0.6386513  0.09826564
##   0.481  0.6386513  0.09826564
##   0.482  0.6323744  0.07813532
##   0.483  0.6323744  0.07813532
##   0.484  0.6323744  0.07813532
##   0.485  0.6323744  0.07813532
##   0.486  0.6260781  0.05654067
##   0.487  0.6201206  0.03658637
##   0.488  0.6201206  0.03658637
##   0.489  0.6201206  0.03658637
##   0.490  0.6201206  0.03658637
##   0.491  0.6201206  0.03658637
##   0.492  0.6201206  0.03658637
##   0.493  0.6201206  0.03658637
##   0.494  0.6201206  0.03658637
##   0.495  0.6201206  0.03658637
##   0.496  0.6201206  0.03658637
##   0.497  0.6201206  0.03658637
##   0.498  0.6201206  0.03658637
##   0.499  0.6201206  0.03658637
##   0.500  0.6140416  0.01850982
##   0.501  0.6140416  0.01850982
##   0.502  0.6140416  0.01850982
##   0.503  0.6140416  0.01850982
##   0.504  0.6140416  0.01850982
##   0.505  0.6140416  0.01850982
##   0.506  0.6089040  0.00000000
##   0.507  0.6089040  0.00000000
##   0.508  0.6089040  0.00000000
##   0.509  0.6089040  0.00000000
##   0.510  0.6089040  0.00000000
##   0.511  0.6089040  0.00000000
##   0.512  0.6089040  0.00000000
##   0.513  0.6089040  0.00000000
##   0.514  0.6089040  0.00000000
##   0.515  0.6089040  0.00000000
##   0.516  0.6089040  0.00000000
##   0.517  0.6089040  0.00000000
##   0.518  0.6089040  0.00000000
##   0.519  0.6089040  0.00000000
##   0.520  0.6089040  0.00000000
##   0.521  0.6089040  0.00000000
##   0.522  0.6089040  0.00000000
##   0.523  0.6089040  0.00000000
##   0.524  0.6089040  0.00000000
##   0.525  0.6089040  0.00000000
##   0.526  0.6089040  0.00000000
##   0.527  0.6089040  0.00000000
##   0.528  0.6089040  0.00000000
##   0.529  0.6089040  0.00000000
##   0.530  0.6089040  0.00000000
##   0.531  0.6089040  0.00000000
##   0.532  0.6089040  0.00000000
##   0.533  0.6089040  0.00000000
##   0.534  0.6089040  0.00000000
##   0.535  0.6089040  0.00000000
##   0.536  0.6089040  0.00000000
##   0.537  0.6089040  0.00000000
##   0.538  0.6089040  0.00000000
##   0.539  0.6089040  0.00000000
##   0.540  0.6089040  0.00000000
##   0.541  0.6089040  0.00000000
##   0.542  0.6089040  0.00000000
##   0.543  0.6089040  0.00000000
##   0.544  0.6089040  0.00000000
##   0.545  0.6089040  0.00000000
##   0.546  0.6089040  0.00000000
##   0.547  0.6089040  0.00000000
##   0.548  0.6089040  0.00000000
##   0.549  0.6089040  0.00000000
##   0.550  0.6089040  0.00000000
##   0.551  0.6089040  0.00000000
##   0.552  0.6089040  0.00000000
##   0.553  0.6089040  0.00000000
##   0.554  0.6089040  0.00000000
##   0.555  0.6089040  0.00000000
##   0.556  0.6089040  0.00000000
##   0.557  0.6089040  0.00000000
##   0.558  0.6089040  0.00000000
##   0.559  0.6089040  0.00000000
##   0.560  0.6089040  0.00000000
##   0.561  0.6089040  0.00000000
##   0.562  0.6089040  0.00000000
##   0.563  0.6089040  0.00000000
##   0.564  0.6089040  0.00000000
##   0.565  0.6089040  0.00000000
##   0.566  0.6089040  0.00000000
##   0.567  0.6089040  0.00000000
##   0.568  0.6089040  0.00000000
##   0.569  0.6089040  0.00000000
##   0.570  0.6089040  0.00000000
##   0.571  0.6089040  0.00000000
##   0.572  0.6089040  0.00000000
##   0.573  0.6089040  0.00000000
##   0.574  0.6089040  0.00000000
##   0.575  0.6089040  0.00000000
##   0.576  0.6089040  0.00000000
##   0.577  0.6089040  0.00000000
##   0.578  0.6089040  0.00000000
##   0.579  0.6089040  0.00000000
##   0.580  0.6089040  0.00000000
##   0.581  0.6089040  0.00000000
##   0.582  0.6089040  0.00000000
##   0.583  0.6089040  0.00000000
##   0.584  0.6089040  0.00000000
##   0.585  0.6089040  0.00000000
##   0.586  0.6089040  0.00000000
##   0.587  0.6089040  0.00000000
##   0.588  0.6089040  0.00000000
##   0.589  0.6089040  0.00000000
##   0.590  0.6089040  0.00000000
##   0.591  0.6089040  0.00000000
##   0.592  0.6089040  0.00000000
##   0.593  0.6089040  0.00000000
##   0.594  0.6089040  0.00000000
##   0.595  0.6089040  0.00000000
##   0.596  0.6089040  0.00000000
##   0.597  0.6089040  0.00000000
##   0.598  0.6089040  0.00000000
##   0.599  0.6089040  0.00000000
##   0.600  0.6089040  0.00000000
##   0.601  0.6089040  0.00000000
##   0.602  0.6089040  0.00000000
##   0.603  0.6089040  0.00000000
##   0.604  0.6089040  0.00000000
##   0.605  0.6089040  0.00000000
##   0.606  0.6089040  0.00000000
##   0.607  0.6089040  0.00000000
##   0.608  0.6089040  0.00000000
##   0.609  0.6089040  0.00000000
##   0.610  0.6089040  0.00000000
##   0.611  0.6089040  0.00000000
##   0.612  0.6089040  0.00000000
##   0.613  0.6089040  0.00000000
##   0.614  0.6089040  0.00000000
##   0.615  0.6089040  0.00000000
##   0.616  0.6089040  0.00000000
##   0.617  0.6089040  0.00000000
##   0.618  0.6089040  0.00000000
##   0.619  0.6089040  0.00000000
##   0.620  0.6089040  0.00000000
##   0.621  0.6089040  0.00000000
##   0.622  0.6089040  0.00000000
##   0.623  0.6089040  0.00000000
##   0.624  0.6089040  0.00000000
##   0.625  0.6089040  0.00000000
##   0.626  0.6089040  0.00000000
##   0.627  0.6089040  0.00000000
##   0.628  0.6089040  0.00000000
##   0.629  0.6089040  0.00000000
##   0.630  0.6089040  0.00000000
##   0.631  0.6089040  0.00000000
##   0.632  0.6089040  0.00000000
##   0.633  0.6089040  0.00000000
##   0.634  0.6089040  0.00000000
##   0.635  0.6089040  0.00000000
##   0.636  0.6089040  0.00000000
##   0.637  0.6089040  0.00000000
##   0.638  0.6089040  0.00000000
##   0.639  0.6089040  0.00000000
##   0.640  0.6089040  0.00000000
##   0.641  0.6089040  0.00000000
##   0.642  0.6089040  0.00000000
##   0.643  0.6089040  0.00000000
##   0.644  0.6089040  0.00000000
##   0.645  0.6089040  0.00000000
##   0.646  0.6089040  0.00000000
##   0.647  0.6089040  0.00000000
##   0.648  0.6089040  0.00000000
##   0.649  0.6089040  0.00000000
##   0.650  0.6089040  0.00000000
##   0.651  0.6089040  0.00000000
##   0.652  0.6089040  0.00000000
##   0.653  0.6089040  0.00000000
##   0.654  0.6089040  0.00000000
##   0.655  0.6089040  0.00000000
##   0.656  0.6089040  0.00000000
##   0.657  0.6089040  0.00000000
##   0.658  0.6089040  0.00000000
##   0.659  0.6089040  0.00000000
##   0.660  0.6089040  0.00000000
##   0.661  0.6089040  0.00000000
##   0.662  0.6089040  0.00000000
##   0.663  0.6089040  0.00000000
##   0.664  0.6089040  0.00000000
##   0.665  0.6089040  0.00000000
##   0.666  0.6089040  0.00000000
##   0.667  0.6089040  0.00000000
##   0.668  0.6089040  0.00000000
##   0.669  0.6089040  0.00000000
##   0.670  0.6089040  0.00000000
##   0.671  0.6089040  0.00000000
##   0.672  0.6089040  0.00000000
##   0.673  0.6089040  0.00000000
##   0.674  0.6089040  0.00000000
##   0.675  0.6089040  0.00000000
##   0.676  0.6089040  0.00000000
##   0.677  0.6089040  0.00000000
##   0.678  0.6089040  0.00000000
##   0.679  0.6089040  0.00000000
##   0.680  0.6089040  0.00000000
##   0.681  0.6089040  0.00000000
##   0.682  0.6089040  0.00000000
##   0.683  0.6089040  0.00000000
##   0.684  0.6089040  0.00000000
##   0.685  0.6089040  0.00000000
##   0.686  0.6089040  0.00000000
##   0.687  0.6089040  0.00000000
##   0.688  0.6089040  0.00000000
##   0.689  0.6089040  0.00000000
##   0.690  0.6089040  0.00000000
##   0.691  0.6089040  0.00000000
##   0.692  0.6089040  0.00000000
##   0.693  0.6089040  0.00000000
##   0.694  0.6089040  0.00000000
##   0.695  0.6089040  0.00000000
##   0.696  0.6089040  0.00000000
##   0.697  0.6089040  0.00000000
##   0.698  0.6089040  0.00000000
##   0.699  0.6089040  0.00000000
##   0.700  0.6089040  0.00000000
##   0.701  0.6089040  0.00000000
##   0.702  0.6089040  0.00000000
##   0.703  0.6089040  0.00000000
##   0.704  0.6089040  0.00000000
##   0.705  0.6089040  0.00000000
##   0.706  0.6089040  0.00000000
##   0.707  0.6089040  0.00000000
##   0.708  0.6089040  0.00000000
##   0.709  0.6089040  0.00000000
##   0.710  0.6089040  0.00000000
##   0.711  0.6089040  0.00000000
##   0.712  0.6089040  0.00000000
##   0.713  0.6089040  0.00000000
##   0.714  0.6089040  0.00000000
##   0.715  0.6089040  0.00000000
##   0.716  0.6089040  0.00000000
##   0.717  0.6089040  0.00000000
##   0.718  0.6089040  0.00000000
##   0.719  0.6089040  0.00000000
##   0.720  0.6089040  0.00000000
##   0.721  0.6089040  0.00000000
##   0.722  0.6089040  0.00000000
##   0.723  0.6089040  0.00000000
##   0.724  0.6089040  0.00000000
##   0.725  0.6089040  0.00000000
##   0.726  0.6089040  0.00000000
##   0.727  0.6089040  0.00000000
##   0.728  0.6089040  0.00000000
##   0.729  0.6089040  0.00000000
##   0.730  0.6089040  0.00000000
##   0.731  0.6089040  0.00000000
##   0.732  0.6089040  0.00000000
##   0.733  0.6089040  0.00000000
##   0.734  0.6089040  0.00000000
##   0.735  0.6089040  0.00000000
##   0.736  0.6089040  0.00000000
##   0.737  0.6089040  0.00000000
##   0.738  0.6089040  0.00000000
##   0.739  0.6089040  0.00000000
##   0.740  0.6089040  0.00000000
##   0.741  0.6089040  0.00000000
##   0.742  0.6089040  0.00000000
##   0.743  0.6089040  0.00000000
##   0.744  0.6089040  0.00000000
##   0.745  0.6089040  0.00000000
##   0.746  0.6089040  0.00000000
##   0.747  0.6089040  0.00000000
##   0.748  0.6089040  0.00000000
##   0.749  0.6089040  0.00000000
##   0.750  0.6089040  0.00000000
##   0.751  0.6089040  0.00000000
##   0.752  0.6089040  0.00000000
##   0.753  0.6089040  0.00000000
##   0.754  0.6089040  0.00000000
##   0.755  0.6089040  0.00000000
##   0.756  0.6089040  0.00000000
##   0.757  0.6089040  0.00000000
##   0.758  0.6089040  0.00000000
##   0.759  0.6089040  0.00000000
##   0.760  0.6089040  0.00000000
##   0.761  0.6089040  0.00000000
##   0.762  0.6089040  0.00000000
##   0.763  0.6089040  0.00000000
##   0.764  0.6089040  0.00000000
##   0.765  0.6089040  0.00000000
##   0.766  0.6089040  0.00000000
##   0.767  0.6089040  0.00000000
##   0.768  0.6089040  0.00000000
##   0.769  0.6089040  0.00000000
##   0.770  0.6089040  0.00000000
##   0.771  0.6089040  0.00000000
##   0.772  0.6089040  0.00000000
##   0.773  0.6089040  0.00000000
##   0.774  0.6089040  0.00000000
##   0.775  0.6089040  0.00000000
##   0.776  0.6089040  0.00000000
##   0.777  0.6089040  0.00000000
##   0.778  0.6089040  0.00000000
##   0.779  0.6089040  0.00000000
##   0.780  0.6089040  0.00000000
##   0.781  0.6089040  0.00000000
##   0.782  0.6089040  0.00000000
##   0.783  0.6089040  0.00000000
##   0.784  0.6089040  0.00000000
##   0.785  0.6089040  0.00000000
##   0.786  0.6089040  0.00000000
##   0.787  0.6089040  0.00000000
##   0.788  0.6089040  0.00000000
##   0.789  0.6089040  0.00000000
##   0.790  0.6089040  0.00000000
##   0.791  0.6089040  0.00000000
##   0.792  0.6089040  0.00000000
##   0.793  0.6089040  0.00000000
##   0.794  0.6089040  0.00000000
##   0.795  0.6089040  0.00000000
##   0.796  0.6089040  0.00000000
##   0.797  0.6089040  0.00000000
##   0.798  0.6089040  0.00000000
##   0.799  0.6089040  0.00000000
##   0.800  0.6089040  0.00000000
##   0.801  0.6089040  0.00000000
##   0.802  0.6089040  0.00000000
##   0.803  0.6089040  0.00000000
##   0.804  0.6089040  0.00000000
##   0.805  0.6089040  0.00000000
##   0.806  0.6089040  0.00000000
##   0.807  0.6089040  0.00000000
##   0.808  0.6089040  0.00000000
##   0.809  0.6089040  0.00000000
##   0.810  0.6089040  0.00000000
##   0.811  0.6089040  0.00000000
##   0.812  0.6089040  0.00000000
##   0.813  0.6089040  0.00000000
##   0.814  0.6089040  0.00000000
##   0.815  0.6089040  0.00000000
##   0.816  0.6089040  0.00000000
##   0.817  0.6089040  0.00000000
##   0.818  0.6089040  0.00000000
##   0.819  0.6089040  0.00000000
##   0.820  0.6089040  0.00000000
##   0.821  0.6089040  0.00000000
##   0.822  0.6089040  0.00000000
##   0.823  0.6089040  0.00000000
##   0.824  0.6089040  0.00000000
##   0.825  0.6089040  0.00000000
##   0.826  0.6089040  0.00000000
##   0.827  0.6089040  0.00000000
##   0.828  0.6089040  0.00000000
##   0.829  0.6089040  0.00000000
##   0.830  0.6089040  0.00000000
##   0.831  0.6089040  0.00000000
##   0.832  0.6089040  0.00000000
##   0.833  0.6089040  0.00000000
##   0.834  0.6089040  0.00000000
##   0.835  0.6089040  0.00000000
##   0.836  0.6089040  0.00000000
##   0.837  0.6089040  0.00000000
##   0.838  0.6089040  0.00000000
##   0.839  0.6089040  0.00000000
##   0.840  0.6089040  0.00000000
##   0.841  0.6089040  0.00000000
##   0.842  0.6089040  0.00000000
##   0.843  0.6089040  0.00000000
##   0.844  0.6089040  0.00000000
##   0.845  0.6089040  0.00000000
##   0.846  0.6089040  0.00000000
##   0.847  0.6089040  0.00000000
##   0.848  0.6089040  0.00000000
##   0.849  0.6089040  0.00000000
##   0.850  0.6089040  0.00000000
##   0.851  0.6089040  0.00000000
##   0.852  0.6089040  0.00000000
##   0.853  0.6089040  0.00000000
##   0.854  0.6089040  0.00000000
##   0.855  0.6089040  0.00000000
##   0.856  0.6089040  0.00000000
##   0.857  0.6089040  0.00000000
##   0.858  0.6089040  0.00000000
##   0.859  0.6089040  0.00000000
##   0.860  0.6089040  0.00000000
##   0.861  0.6089040  0.00000000
##   0.862  0.6089040  0.00000000
##   0.863  0.6089040  0.00000000
##   0.864  0.6089040  0.00000000
##   0.865  0.6089040  0.00000000
##   0.866  0.6089040  0.00000000
##   0.867  0.6089040  0.00000000
##   0.868  0.6089040  0.00000000
##   0.869  0.6089040  0.00000000
##   0.870  0.6089040  0.00000000
##   0.871  0.6089040  0.00000000
##   0.872  0.6089040  0.00000000
##   0.873  0.6089040  0.00000000
##   0.874  0.6089040  0.00000000
##   0.875  0.6089040  0.00000000
##   0.876  0.6089040  0.00000000
##   0.877  0.6089040  0.00000000
##   0.878  0.6089040  0.00000000
##   0.879  0.6089040  0.00000000
##   0.880  0.6089040  0.00000000
##   0.881  0.6089040  0.00000000
##   0.882  0.6089040  0.00000000
##   0.883  0.6089040  0.00000000
##   0.884  0.6089040  0.00000000
##   0.885  0.6089040  0.00000000
##   0.886  0.6089040  0.00000000
##   0.887  0.6089040  0.00000000
##   0.888  0.6089040  0.00000000
##   0.889  0.6089040  0.00000000
##   0.890  0.6089040  0.00000000
##   0.891  0.6089040  0.00000000
##   0.892  0.6089040  0.00000000
##   0.893  0.6089040  0.00000000
##   0.894  0.6089040  0.00000000
##   0.895  0.6089040  0.00000000
##   0.896  0.6089040  0.00000000
##   0.897  0.6089040  0.00000000
##   0.898  0.6089040  0.00000000
##   0.899  0.6089040  0.00000000
##   0.900  0.6089040  0.00000000
##   0.901  0.6089040  0.00000000
##   0.902  0.6089040  0.00000000
##   0.903  0.6089040  0.00000000
##   0.904  0.6089040  0.00000000
##   0.905  0.6089040  0.00000000
##   0.906  0.6089040  0.00000000
##   0.907  0.6089040  0.00000000
##   0.908  0.6089040  0.00000000
##   0.909  0.6089040  0.00000000
##   0.910  0.6089040  0.00000000
##   0.911  0.6089040  0.00000000
##   0.912  0.6089040  0.00000000
##   0.913  0.6089040  0.00000000
##   0.914  0.6089040  0.00000000
##   0.915  0.6089040  0.00000000
##   0.916  0.6089040  0.00000000
##   0.917  0.6089040  0.00000000
##   0.918  0.6089040  0.00000000
##   0.919  0.6089040  0.00000000
##   0.920  0.6089040  0.00000000
##   0.921  0.6089040  0.00000000
##   0.922  0.6089040  0.00000000
##   0.923  0.6089040  0.00000000
##   0.924  0.6089040  0.00000000
##   0.925  0.6089040  0.00000000
##   0.926  0.6089040  0.00000000
##   0.927  0.6089040  0.00000000
##   0.928  0.6089040  0.00000000
##   0.929  0.6089040  0.00000000
##   0.930  0.6089040  0.00000000
##   0.931  0.6089040  0.00000000
##   0.932  0.6089040  0.00000000
##   0.933  0.6089040  0.00000000
##   0.934  0.6089040  0.00000000
##   0.935  0.6089040  0.00000000
##   0.936  0.6089040  0.00000000
##   0.937  0.6089040  0.00000000
##   0.938  0.6089040  0.00000000
##   0.939  0.6089040  0.00000000
##   0.940  0.6089040  0.00000000
##   0.941  0.6089040  0.00000000
##   0.942  0.6089040  0.00000000
##   0.943  0.6089040  0.00000000
##   0.944  0.6089040  0.00000000
##   0.945  0.6089040  0.00000000
##   0.946  0.6089040  0.00000000
##   0.947  0.6089040  0.00000000
##   0.948  0.6089040  0.00000000
##   0.949  0.6089040  0.00000000
##   0.950  0.6089040  0.00000000
##   0.951  0.6089040  0.00000000
##   0.952  0.6089040  0.00000000
##   0.953  0.6089040  0.00000000
##   0.954  0.6089040  0.00000000
##   0.955  0.6089040  0.00000000
##   0.956  0.6089040  0.00000000
##   0.957  0.6089040  0.00000000
##   0.958  0.6089040  0.00000000
##   0.959  0.6089040  0.00000000
##   0.960  0.6089040  0.00000000
##   0.961  0.6089040  0.00000000
##   0.962  0.6089040  0.00000000
##   0.963  0.6089040  0.00000000
##   0.964  0.6089040  0.00000000
##   0.965  0.6089040  0.00000000
##   0.966  0.6089040  0.00000000
##   0.967  0.6089040  0.00000000
##   0.968  0.6089040  0.00000000
##   0.969  0.6089040  0.00000000
##   0.970  0.6089040  0.00000000
##   0.971  0.6089040  0.00000000
##   0.972  0.6089040  0.00000000
##   0.973  0.6089040  0.00000000
##   0.974  0.6089040  0.00000000
##   0.975  0.6089040  0.00000000
##   0.976  0.6089040  0.00000000
##   0.977  0.6089040  0.00000000
##   0.978  0.6089040  0.00000000
##   0.979  0.6089040  0.00000000
##   0.980  0.6089040  0.00000000
##   0.981  0.6089040  0.00000000
##   0.982  0.6089040  0.00000000
##   0.983  0.6089040  0.00000000
##   0.984  0.6089040  0.00000000
##   0.985  0.6089040  0.00000000
##   0.986  0.6089040  0.00000000
##   0.987  0.6089040  0.00000000
##   0.988  0.6089040  0.00000000
##   0.989  0.6089040  0.00000000
##   0.990  0.6089040  0.00000000
##   0.991  0.6089040  0.00000000
##   0.992  0.6089040  0.00000000
##   0.993  0.6089040  0.00000000
##   0.994  0.6089040  0.00000000
##   0.995  0.6089040  0.00000000
##   0.996  0.6089040  0.00000000
##   0.997  0.6089040  0.00000000
##   0.998  0.6089040  0.00000000
##   0.999  0.6089040  0.00000000
##   1.000  0.6089040  0.00000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.003.

Bootstap aggregating

Bootstap aggregating

Aggregating trees over bootstrap samples has some avantages:

  1. Aggregated weak learners typically outperform a single learner over the entire set. The ensemble has less overfit.
  2. Aggregating removes variance in high-variance low-bias weak learner scenarios.
  3. Aggregation can be paralellized.
  4. Out-of-Bag cases can be used for validation.

Aggregating also has disadvanteges

  1. Lots of aggregations are computationally costly, especially for larger data sets.
  2. Interpretability of the ensemble may be lost
  3. Which features are important if their contribution is averaged?
  4. Weak learners with high bias will still have high bias in the bagging, yielding high Out-of-Bag performance.

Variable importance

varImp(tree)
## rpart variable importance
## 
##                   Overall
## Sexmale            100.00
## Pclass3rd class     72.05
## Age                 53.07
## Pclass2nd class     19.66
## `Pclass2nd class`    0.00
## `Pclass3rd class`    0.00

Variable importance

varImp(tree) %>% plot

Bagging

tr.ctrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
bag <- train(Survived ~ Age + Pclass + Sex, 
             data = titanic, 
             method = "treebag",
             trControl = tr.ctrl,
             nbagg = 200,  
             control = rpart.control(minsplit = 10, minbucket = 3, cp = 0.003))
  • minsplit: the minimum number of observations that must exist in a node in order for a split to be attempted.
  • minbucket: the minimum number of observations in any terminal leaf node.

Bagging result

bag
## Bagged CART 
## 
## 887 samples
##   3 predictor
##   2 classes: 'Died', 'Survived' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 799, 798, 799, 798, 797, 798, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8229653  0.6155562

Bagging variable importance

varImp(bag) %>% plot

However: in the bagging approach, the trees are correlated because the same predictors are used every time.

Random forests

With Random Forest, the predictor space is limited for each tree: only a subset \(m\) of predictors \(p\) is chosen.

  • with bagging only the best predictors end up in every tree
    • trees are correlated
    • variance is high
  • with random forests every predictor has a fair chance
    • trees are decorrelated
    • variance is low
rf <- train(Survived ~ Age + Pclass + Sex, 
            data = titanic, 
            method = "rf", 
            trControl = tr.ctrl)

Optimizing \(m<p\) predictors mtry

rf
## Random Forest 
## 
## 887 samples
##   3 predictor
##   2 classes: 'Died', 'Survived' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 797, 798, 798, 798, 799, 798, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.8095579  0.5820553
##   3     0.8072222  0.5791266
##   4     0.8004545  0.5660295
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

RF Importance

varImp(rf) %>% plot

Boosting

Boosting

boost <- train(Survived ~ Age + Pclass + Sex, 
               data = titanic, 
               method = "adaboost", 
               trControl = tr.ctrl)
  • Apply a weak classifier (e.g. stump) to training data;
  • Increase the weights for incorrect classifications, and repeat;
  • The ensemble classifier is a linear combination of the weak classifiers.

Boosting

boost
## AdaBoost Classification Trees 
## 
## 887 samples
##   3 predictor
##   2 classes: 'Died', 'Survived' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 798, 799, 798, 797, 798, 799, ... 
## Resampling results across tuning parameters:
## 
##   nIter  method         Accuracy   Kappa    
##    50    Adaboost.M1    0.7812876  0.5438129
##    50    Real adaboost  0.7835476  0.5466679
##   100    Adaboost.M1    0.7836114  0.5505397
##   100    Real adaboost  0.7812751  0.5423179
##   150    Adaboost.M1    0.7768315  0.5382832
##   150    Real adaboost  0.7812751  0.5425039
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nIter = 100 and method = Adaboost.M1.

Boosting performance

plot(boost)

Disclaimer

I have introduced you to the core machine learning techniques in R.

These techniques come together in the caret package.





See the caret documentation for all techniques that can be trained with caret





If you have any questions in the future, drop me a line.