We use the following packages:
library(tidyverse)
library(magrittr)
library(caret)
library(kernlab)
library(MLeval)
library(pROC)
Let’s take the titanic
data that we used before and fit the following four models on a training version (70% of cases) of that data set.
Finally, compare the performance of all 4 techniques on the test version (30% of not yet used cases) of that data set.
We can use the following code block to directly load the data in our workspace:
con <- url("https://www.gerkovink.com/erasmus/Day%202/Part%20D/titanic.csv")
titanic <- read_csv(con)
## Rows: 887 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Name, Sex
## dbl (6): Survived, Pclass, Age, Siblings/Spouses Aboard, Parents/Children Ab...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We need to take care of some columns that are not well-coded. Let’s make all the measurement levels as they are supposed to be. That means factors into factors, ordered factors into ordered factors, etc.
titanic %<>%
mutate(Pclass = factor(Pclass,
ordered = TRUE,
labels = c("1st class", "2nd class", "3rd class")),
Survived = factor(Survived,
labels = c("Died", "Survived")))
str(titanic)
## spec_tbl_df [887 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Survived : Factor w/ 2 levels "Died","Survived": 1 2 2 2 1 1 1 1 2 2 ...
## $ Pclass : Ord.factor w/ 3 levels "1st class"<"2nd class"<..: 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr [1:887] "Mr. Owen Harris Braund" "Mrs. John Bradley (Florence Briggs Thayer) Cumings" "Miss. Laina Heikkinen" "Mrs. Jacques Heath (Lily May Peel) Futrelle" ...
## $ Sex : chr [1:887] "male" "female" "female" "female" ...
## $ Age : num [1:887] 22 38 26 35 35 27 54 2 27 14 ...
## $ Siblings/Spouses Aboard: num [1:887] 1 1 0 1 0 0 0 3 0 1 ...
## $ Parents/Children Aboard: num [1:887] 0 0 0 0 0 0 0 1 2 0 ...
## $ Fare : num [1:887] 7.25 71.28 7.92 53.1 8.05 ...
## - attr(*, "spec")=
## .. cols(
## .. Survived = col_double(),
## .. Pclass = col_double(),
## .. Name = col_character(),
## .. Sex = col_character(),
## .. Age = col_double(),
## .. `Siblings/Spouses Aboard` = col_double(),
## .. `Parents/Children Aboard` = col_double(),
## .. Fare = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
The %<>%
pipe returns the result of the pipe to the object.
Fit the four above mentioned models on the training set and evaluate their performance on the test set.
End of Practical