We use the following packages:

library(tidyverse)
library(magrittr)
library(caret)
library(kernlab)
library(MLeval)
library(pROC)

Introduction

Let’s take the titanic data that we used before and fit the following four models on a training version (70% of cases) of that data set.

  1. A logistic regression model
  2. A linear kernel SVM
  3. A polynomial kernel SVM
  4. A radial kernel SVM

Finally, compare the performance of all 4 techniques on the test version (30% of not yet used cases) of that data set.


Grab the data

We can use the following code block to directly load the data in our workspace:

con <- url("https://www.gerkovink.com/erasmus/Day%202/Part%20D/titanic.csv")
titanic <- read_csv(con)
## Rows: 887 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Name, Sex
## dbl (6): Survived, Pclass, Age, Siblings/Spouses Aboard, Parents/Children Ab...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Prepare the data

We need to take care of some columns that are not well-coded. Let’s make all the measurement levels as they are supposed to be. That means factors into factors, ordered factors into ordered factors, etc.

titanic %<>% 
  mutate(Pclass   = factor(Pclass, 
                         ordered = TRUE, 
                         labels = c("1st class", "2nd class", "3rd class")), 
         Survived = factor(Survived, 
                           labels = c("Died", "Survived")))

str(titanic)
## spec_tbl_df [887 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Survived               : Factor w/ 2 levels "Died","Survived": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Pclass                 : Ord.factor w/ 3 levels "1st class"<"2nd class"<..: 3 1 3 1 3 3 1 3 3 2 ...
##  $ Name                   : chr [1:887] "Mr. Owen Harris Braund" "Mrs. John Bradley (Florence Briggs Thayer) Cumings" "Miss. Laina Heikkinen" "Mrs. Jacques Heath (Lily May Peel) Futrelle" ...
##  $ Sex                    : chr [1:887] "male" "female" "female" "female" ...
##  $ Age                    : num [1:887] 22 38 26 35 35 27 54 2 27 14 ...
##  $ Siblings/Spouses Aboard: num [1:887] 1 1 0 1 0 0 0 3 0 1 ...
##  $ Parents/Children Aboard: num [1:887] 0 0 0 0 0 0 0 1 2 0 ...
##  $ Fare                   : num [1:887] 7.25 71.28 7.92 53.1 8.05 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Survived = col_double(),
##   ..   Pclass = col_double(),
##   ..   Name = col_character(),
##   ..   Sex = col_character(),
##   ..   Age = col_double(),
##   ..   `Siblings/Spouses Aboard` = col_double(),
##   ..   `Parents/Children Aboard` = col_double(),
##   ..   Fare = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

The %<>% pipe returns the result of the pipe to the object.


Exercise

Fit the four above mentioned models on the training set and evaluate their performance on the test set.


End of Practical