In this practical, we will focus on ridge regression.
One of the packages we are going to use is glmnet
. For this, you will probably need to install.packages("glmnet")
before running the library()
functions. GGally
is also a new package, that needs to be installed to access the ggpairs()
function.
library(MASS)
library(magrittr)
library(dplyr)
library(GGally)
library(glmnet)
library(caret)
Before starting with the exercises, it is a good idea to set your seed, so that (1) your answers are reproducible and (2) you can compare your answers with the answers provided.
set.seed(123)
The mtcars
data set from package MASS
contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)
mtcars
data set. Is everything properly coded?.Let’s start with studying the dimensionality of the mtcars
data
dim(mtcars)
## [1] 32 11
There are not too many dimensions. The dataset holds 32
cases over 11
columns. This is tiny if we’d like to predict one column based on the others.
The reason why it is wise to start with the dimensionality of data is:
str()
and summary
may be redundant as the output return would be too large to study.Now, let’s look at the structure of the data:
mtcars %>% str
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
The columns vs
and am
seem dichotomous. A call to summary()
verifies this:
mtcars %>% summary
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
The actual nature of these columns can also be retrieved from the help file. See e.g. ?mtcars
.
Let’s recode the vs
and am
columns into factors. To avoid confusion later on, I’ll name the recoded data set mtc
.
mtc <- mtcars %>%
mutate(vs = factor(vs, labels = c("v-shaped", "straight")),
am = factor(am, labels = c("automatic", "manual")))
The measurement for these columns has now been properly set:
mtc %>% str
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "v-shaped","straight": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "automatic","manual": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
I always like GGally
’s function ggpairs()
to inspect the multivariate structure.
ggpairs(mtc)
It is clear that many features have quite substantial interrelations. The correlations between the first six columns are quite high. The boxplots for
am
and vs
also demonstrate that there are quite some effects with the first variables. It is also apparant that for some variable combinations there are very few observations.
hp
as the response and all other features as the predictors. Try to use the exposition (%$%
) pipe.This is straightforward:
fit <- mtc %$%
lm(hp ~ ., data = .)
fit %>% summary
##
## Call:
## lm(formula = hp ~ ., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.681 -15.558 0.799 18.106 34.718
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 79.0484 184.5041 0.428 0.67270
## mpg -2.0631 2.0906 -0.987 0.33496
## cyl 8.2037 10.0861 0.813 0.42513
## disp 0.4390 0.1492 2.942 0.00778 **
## drat -4.6185 16.0829 -0.287 0.77680
## wt -27.6600 19.2704 -1.435 0.16591
## qsec -1.7844 7.3639 -0.242 0.81089
## vsstraight 25.8129 19.8512 1.300 0.20758
## ammanual 9.4863 20.7599 0.457 0.65240
## gear 7.2164 14.6160 0.494 0.62662
## carb 18.7487 7.0288 2.667 0.01441 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.97 on 21 degrees of freedom
## Multiple R-squared: 0.9028, Adjusted R-squared: 0.8565
## F-statistic: 19.5 on 10 and 21 DF, p-value: 1.898e-08
The model seems to fit quite well to the data, as indicated by the high R-squared (and relatively similar Adjusted R-squared); yet, few of the predictors are significant. In short: the observed outcome seems to be quite well-represented with this model.
ridge
.We use glmnet
to fit the ridge regression.
y <- mtc$hp
x <- model.matrix(fit)
ridge <- glmnet(x = x, y = y,
family = "gaussian",
alpha = 0,
standardize = TRUE)
glmnet
requires the data to be specified in terms of a response \(y\) and a predictor space \(\mathbf{X}\). The predictorspace \(\mathbf{X}\) needs to be a design matrix - that means that all factors are coded as dummies, etc. The easiest way to obtain such a design matrix is with the model.matrix()
function. Since I already created a fitted linear model, I might as well grab the design matrix from that object. The convenience here is that the lm()
function already realized the factor expansion into dummies for us.
The remaining two function arguments in the glmnet()
call are vital for the process of ridge estimation:
alpha = 0
indicates that ridge regressions’s \(L_2\) regularization should be used.standardize = TRUE
is the default flag that indicates that the predictor space x
should be standardized by glmnet()
during estimation. A flag of standardize = FALSE
would indicate that glmnet()
does not perform standardization.Why is this standardization important? Let’s imagine that some features are measured in completely different units than other features. Then, the ones that would yield large OLS coefficients will be penalized more than estimates that yueld smaller OLS coefficients. In our example:
fit %>% coef
## (Intercept) mpg cyl disp drat wt
## 79.0483879 -2.0630545 8.2037204 0.4390024 -4.6185488 -27.6600472
## qsec vsstraight ammanual gear carb
## -1.7843654 25.8128774 9.4862914 7.2164047 18.7486691
the coefficients for wt
, vsstraight
and carb
will be shrunk more than e.g. disp
. This would not be fair, because engine displacement disp
is measured in cubic inches (a high number) and wt
is measured in 1000 lbs units. The unit scale between these variables is different and, hence, the coefficients are not comparable. Standardization would put all features in the same units (\(\sim \mathcal{N}(0, 1)\)) such that their relative importance with respect tot the response variable can be compared.
Naturally, the coefficients are backtransformed in the final model, such that they are in the scale of the original predictor values.
ridge
object and the coef()
thereof.Let’s start with the object ridge
:
ridge
##
## Call: glmnet(x = x, y = y, family = "gaussian", alpha = 0, standardize = TRUE)
##
## Df %Dev Lambda
## 1 10 0.00 56180
## 2 10 1.10 51190
## 3 10 1.21 46640
## 4 10 1.32 42500
## 5 10 1.45 38720
## 6 10 1.59 35280
## 7 10 1.74 32150
## 8 10 1.91 29290
## 9 10 2.09 26690
## 10 10 2.29 24320
## 11 10 2.51 22160
## 12 10 2.74 20190
## 13 10 3.00 18400
## 14 10 3.29 16760
## 15 10 3.60 15270
## 16 10 3.93 13920
## 17 10 4.30 12680
## 18 10 4.70 11550
## 19 10 5.14 10530
## 20 10 5.61 9591
## 21 10 6.12 8739
## 22 10 6.68 7963
## 23 10 7.29 7255
## 24 10 7.94 6611
## 25 10 8.65 6024
## 26 10 9.41 5488
## 27 10 10.24 5001
## 28 10 11.13 4557
## 29 10 12.08 4152
## 30 10 13.10 3783
## 31 10 14.20 3447
## 32 10 15.37 3141
## 33 10 16.62 2862
## 34 10 17.95 2607
## 35 10 19.36 2376
## 36 10 20.86 2165
## 37 10 22.43 1972
## 38 10 24.08 1797
## 39 10 25.82 1638
## 40 10 27.63 1492
## 41 10 29.51 1360
## 42 10 31.47 1239
## 43 10 33.48 1129
## 44 10 35.55 1028
## 45 10 37.67 937
## 46 10 39.83 854
## 47 10 42.02 778
## 48 10 44.23 709
## 49 10 46.44 646
## 50 10 48.65 588
## 51 10 50.85 536
## 52 10 53.02 489
## 53 10 55.16 445
## 54 10 57.24 406
## 55 10 59.28 370
## 56 10 61.24 337
## 57 10 63.13 307
## 58 10 64.95 280
## 59 10 66.68 255
## 60 10 68.32 232
## 61 10 69.87 212
## 62 10 71.32 193
## 63 10 72.69 176
## 64 10 73.96 160
## 65 10 75.14 146
## 66 10 76.23 133
## 67 10 77.24 121
## 68 10 78.17 110
## 69 10 79.02 100
## 70 10 79.80 92
## 71 10 80.52 83
## 72 10 81.17 76
## 73 10 81.77 69
## 74 10 82.31 63
## 75 10 82.81 58
## 76 10 83.27 52
## 77 10 83.70 48
## 78 10 84.08 44
## 79 10 84.44 40
## 80 10 84.78 36
## 81 10 85.09 33
## 82 10 85.38 30
## 83 10 85.65 27
## 84 10 85.91 25
## 85 10 86.16 23
## 86 10 86.38 21
## 87 10 86.61 19
## 88 10 86.82 17
## 89 10 87.01 16
## 90 10 87.21 14
## 91 10 87.39 13
## 92 10 87.57 12
## 93 10 87.73 11
## 94 10 87.90 10
## 95 10 88.05 9
## 96 10 88.20 8
## 97 10 88.34 7
## 98 10 88.47 7
## 99 10 88.60 6
## 100 10 88.72 6
The output shows three columns over 100 options for \(\lambda\):
Df
and %Dev
.Function glmnet()
fits the model for 100 values of lambda by default. The function will stop when %Dev
does not change sufficently from one lambda
to the next
Let’s look at the coefficients.
ridge %>% coef()
## 12 x 100 sparse Matrix of class "dgCMatrix"
## [[ suppressing 100 column names 's0', 's1', 's2' ... ]]
##
## (Intercept) 1.466875e+02 1.472529e+02 1.473075e+02 1.473673e+02
## (Intercept) . . . .
## mpg -8.918920e-36 -1.155421e-02 -1.267149e-02 -1.389585e-02
## cyl 3.228109e-35 4.182744e-02 4.587301e-02 5.030647e-02
## disp 4.419724e-37 5.726389e-04 6.280213e-04 6.887128e-04
## drat -5.812650e-35 -7.513747e-02 -8.238584e-02 -9.032526e-02
## wt 4.662631e-35 6.036284e-02 6.619567e-02 7.258658e-02
## qsec -2.744816e-35 -3.562883e-02 -3.908167e-02 -4.286691e-02
## vsstraight -9.935867e-35 -1.287315e-01 -1.411815e-01 -1.548249e-01
## ammanual -3.375455e-35 -4.348827e-02 -4.766809e-02 -5.224324e-02
## gear -1.179948e-35 -1.507575e-02 -1.651120e-02 -1.807963e-02
## carb 3.214994e-35 4.173691e-02 4.578222e-02 5.021709e-02
##
## (Intercept) 1.474329e+02 1.475047e+02 1.475834e+02 1.476695e+02
## (Intercept) . . . .
## mpg -1.523733e-02 -1.670692e-02 -1.831654e-02 -2.007921e-02
## cyl 5.516426e-02 6.048617e-02 6.631554e-02 7.269954e-02
## disp 7.552123e-04 8.280642e-04 9.078614e-04 9.952495e-04
## drat -9.901993e-02 -1.085397e-01 -1.189605e-01 -1.303648e-01
## wt 7.958783e-02 8.725635e-02 9.565413e-02 1.048486e-01
## qsec -4.701610e-02 -5.156370e-02 -5.654731e-02 -6.200796e-02
## vsstraight -1.697738e-01 -1.861506e-01 -2.040886e-01 -2.237327e-01
## ammanual -5.724982e-02 -6.272697e-02 -6.871705e-02 -7.526586e-02
## gear -1.979264e-02 -2.166263e-02 -2.370290e-02 -2.592766e-02
## carb 5.507850e-02 6.040686e-02 6.624629e-02 7.264492e-02
##
## (Intercept) 147.763769583 147.866921292 147.97977190 148.103195779
## (Intercept) . . . .
## mpg -0.022009062 -0.024121470 -0.02643311 -0.028962077
## cyl 0.079689509 0.087341234 0.09571529 0.104877399
## disp 0.001090931 0.001195668 0.00131029 0.001435696
## drat -0.142841780 -0.156488409 -0.17140934 -0.187717676
## wt 0.114912937 0.125926736 0.13797620 0.151154724
## qsec -0.067990389 -0.074543327 -0.08171984 -0.089577679
## vsstraight -0.245240936 -0.268784618 -0.29454996 -0.322738883
## ammanual -0.082422809 -0.090241173 -0.09877822 -0.108095386
## gear -0.028352020 -0.030991994 -0.03386450 -0.036987297
## carb 0.079655217 0.087334390 0.09574472 0.104953998
##
## (Intercept) 148.238138647 148.385621817 148.546751999 148.722704928
## (Intercept) . . . .
## mpg -0.031727952 -0.034751924 -0.038057437 -0.041668215
## cyl 0.114898745 0.125856365 0.137834720 0.150921545
## disp 0.001572859 0.001722834 0.001886766 0.002065869
## drat -0.205535349 -0.224993612 -0.246234221 -0.269406933
## wt 0.165563324 0.181311106 0.198515834 0.217303934
## qsec -0.098179592 -0.107593744 -0.117894181 -0.129160877
## vsstraight -0.353570081 -0.387280154 -0.424124572 -0.464379136
## ammanual -0.118258447 -0.129337595 -0.141407292 -0.154547079
## gear -0.040378934 -0.044058638 -0.048046162 -0.052361650
## carb 0.115035916 0.126070539 0.138144743 0.151352735
##
## (Intercept) 148.914757835 149.124274960 149.352714695 149.601631771
## (Intercept) . . . .
## mpg -0.045611252 -0.049915089 -0.054610320 -0.059729662
## cyl 0.165214510 0.180817607 0.197842389 0.216408246
## disp 0.002261471 0.002474995 0.002707965 0.002962012
## drat -0.294673866 -0.322207194 -0.352189869 -0.384815600
## wt 0.237811532 0.260184432 0.284578536 0.311160084
## qsec -0.141480643 -0.154947231 -0.169661821 -0.185733434
## vsstraight -0.508340504 -0.556327500 -0.608681934 -0.665769375
## ammanual -0.168840385 -0.184375068 -0.201242967 -0.219539498
## gear -0.057025306 -0.062057122 -0.067476456 -0.073301524
## carb 0.165796493 0.181586294 0.198841214 0.217689620
##
## (Intercept) 149.872678504 150.167604853 150.488257008 150.836574179
## (Intercept) . . . .
## mpg -0.065308007 -0.071382459 -0.077992341 -0.085179179
## cyl 0.236642623 0.258681179 0.282667856 0.308754853
## disp 0.003238877 0.003540414 0.003868588 0.004225478
## drat -0.420288670 -0.458823526 -0.500644127 -0.545982978
## wt 0.340105807 0.371602947 0.405849138 0.443052091
## qsec -0.203279319 -0.222425329 -0.243306253 -0.266066099
## vsstraight -0.727979746 -0.795727701 -0.869452714 -0.949618816
## ammanual -0.239363071 -0.260814324 -0.283995114 -0.309007260
## gear -0.079548777 -0.086232135 -0.093362072 -0.100944515
## carb 0.238269659 0.260729713 0.285228826 0.311937073
##
## (Intercept) 151.214583239 151.624390808 152.068172378 152.548158005
## (Intercept) . . . .
## mpg -0.092986639 -0.101460433 -0.110648162 -0.120599106
## cyl 0.337102459 0.367878748 0.401259085 0.437425428
## disp 0.004613273 0.005034268 0.005490857 0.005985524
## drat -0.595079812 -0.648179856 -0.705531624 -0.767384171
## wt 0.483429064 0.527206071 0.574616785 0.625901093
## qsec -0.290858316 -0.317845926 -0.347201561 -0.379107365
## vsstraight -1.036713906 -1.131248533 -1.233754082 -1.344780224
## ammanual -0.335950965 -0.364922902 -0.396013915 -0.429306293
## gear -0.108979550 -0.117459894 -0.126369126 -0.135679648
## carb 0.341035864 0.372718154 0.407188551 0.444663275
##
## (Intercept) 153.066614130 153.625821050 154.228045619 154.875508774
## (Intercept) . . . .
## mpg -0.131363944 -0.142994396 -0.155542786 -0.169061510
## cyl 0.476565388 0.518871027 0.564537348 0.613760469
## disp 0.006520831 0.007099401 0.007723898 0.008397004
## drat -0.833983765 -0.905569894 -0.982370590 -1.064597020
## wt 0.681303264 0.741069672 0.805446047 0.874674217
## qsec -0.413754751 -0.451343972 -0.492083494 -0.536189122
## vsstraight -1.464891557 -1.594663299 -1.734675929 -1.885508670
## ammanual -0.464870583 -0.502761924 -0.543015864 -0.585643683
## gear -0.145350356 -0.155324011 -0.165524310 -0.175852655
## carb 0.485369962 0.529547274 0.577444280 0.629319587
##
## (Intercept) 155.570347591 156.314571661 157.1100137 157.95827492 158.86066448
## (Intercept) . . . . .
## mpg -0.183602401 -0.199216008 -0.2159508 -0.23385202 -0.25296110
## cyl 0.666735448 0.723653749 0.7847003 0.85005042 0.91986585
## disp 0.009121388 0.009899673 0.0107344 0.01162799 0.01258268
## drat -1.152437325 -1.246049737 -1.3455550 -1.45102813 -1.56248983
## wt 0.948988304 1.028610385 1.1137456 1.20457679 1.30125858
## qsec -0.583882865 -0.635391507 -0.6909448 -0.75077360 -0.81510695
## vsstraight -2.047731707 -2.221897056 -2.4085280 -2.60810727 -2.82106335
## ammanual -0.630627219 -0.677913254 -0.7274075 -0.77896853 -0.83240114
## gear -0.186184633 -0.196366265 -0.2062101 -0.21549096 -0.22394231
## carb 0.685440175 0.746079921 0.8115178 0.88203557 0.95791543
##
## (Intercept) 159.81811021 160.8311856 161.89990556 163.02374078 164.20152973
## (Intercept) . . . . .
## mpg -0.27331128 -0.2949375 -0.31785975 -0.34209161 -0.36763680
## cyl 0.99429254 1.0734530 1.15744551 1.24633823 1.34016540
## disp 0.01360057 0.0146833 0.01583234 0.01704876 0.01833318
## drat -1.67991187 -1.8031569 -1.93203928 -2.06627439 -2.20548140
## wt 1.40391684 1.5126220 1.62741036 1.74826031 1.87509060
## qsec -0.88417110 -0.9581810 -1.03734385 -1.12185065 -1.21187332
## vsstraight -3.04775854 -3.2884633 -3.54335274 -3.81248005 -4.09576059
## ammanual -0.88745093 -0.9437961 -1.00104587 -1.05873374 -1.11631580
## gear -0.23125115 -0.2370573 -0.24094690 -0.24245214 -0.24104958
## carb 1.03943678 1.1268730 1.22048767 1.32053059 1.42723321
##
## (Intercept) 165.43141454 166.71078412 168.03622714 169.40349792 170.80749780
## (Intercept) . . . . .
## mpg -0.39448811 -0.42262638 -0.45201978 -0.48262323 -0.51437807
## cyl 1.43892332 1.54256684 1.65100644 1.76410600 1.88168158
## disp 0.01968576 0.02110615 0.02259347 0.02414629 0.02576265
## drat -2.34917759 -2.49677466 -2.64757754 -2.80078598 -2.95549944
## wt 2.00775473 2.14603643 2.28964634 2.43822012 2.59131827
## qsec -1.30756010 -1.40903101 -1.51637323 -1.62963648 -1.74882859
## vsstraight -4.39295340 -4.70364329 -5.02722401 -5.36288283 -5.70958728
## ammanual -1.17316971 -1.22859627 -1.28182361 -1.33201447 -1.37827680
## gear -0.23616026 -0.22715134 -0.21333965 -0.19399731 -0.16835964
## carb 1.54080416 1.66142440 1.78924246 1.92436963 2.06687544
##
## (Intercept) 172.24227472 173.70104280 175.17622359 176.6595093 178.14194788
## (Intercept) . . . . .
## mpg -0.54721208 -0.58103978 -0.61576307 -0.6512722 -0.68744704
## cyl 2.00350119 2.12928576 2.25871132 2.3914124 2.52698688
## disp 0.02744002 0.02917541 0.03096534 0.0328060 0.03469325
## drat -3.11072551 -3.26539205 -3.41836297 -3.5684576 -3.71447308
## wt 2.74842788 2.90896650 3.07228799 3.2376906 3.40442683
## qsec -1.87391140 -2.00479710 -2.14134520 -2.2833603 -2.43059061
## vsstraight -6.06607466 -6.43084464 -6.80215578 -7.1780260 -7.55623762
## ammanual -1.41967780 -1.45526151 -1.48406952 -1.5051647 -1.51765678
## gear -0.13563534 -0.09501909 -0.04570629 0.0130903 0.08212249
## carb 2.21678335 2.37406698 2.53864694 2.7103883 2.88909925
##
## (Intercept) 179.6140487 181.06590616 182.48733747 183.86803044 185.19769432
## (Intercept) . . . . .
## mpg -0.7241585 -0.76127021 -0.79864063 -0.83612474 -0.87357629
## cyl 2.6650015 2.80499904 2.94650635 3.08904307 3.23213163
## disp 0.0366228 0.03859027 0.04059138 0.04262205 0.04467859
## drat -3.8552098 -3.98949858 -4.11622901 -4.23437865 -4.34304096
## wt 3.5717153 3.73875337 3.90473153 4.06884706 4.23031787
## qsec -2.5827281 -2.73940904 -2.90021674 -3.06468495 -3.23230303
## vsstraight -7.9343467 -8.30969725 -8.67943899 -9.04054940 -9.38985813
## ammanual -1.5207299 -1.51366965 -1.49588918 -1.46695287 -1.42659520
## gear 0.1620877 0.25361006 0.35722224 0.47334926 0.60229509
## carb 3.0745299 3.26637326 3.46426613 3.66779170 3.87648269
##
## (Intercept) 186.46620739 187.66375368 188.77592851 189.80262748 190.73216640
## (Intercept) . . . . .
## mpg -0.91084998 -0.94780384 -0.98462612 -1.02069427 -1.05608906
## cyl 3.37530796 3.51813308 3.66143703 3.80271164 3.94254949
## disp 0.04675784 0.04885734 0.05098625 0.05312257 0.05527569
## drat -4.44145132 -4.52900935 -4.60585355 -4.67018092 -4.72264347
## wt 4.38839444 4.54236936 4.69101170 4.83437186 4.97163620
## qsec -3.40252215 -3.57476256 -3.74827883 -3.92263045 -4.09712960
## vsstraight -9.72407313 -10.03980723 -10.33210468 -10.59989701 -10.83870257
## ammanual -1.37473361 -1.31147332 -1.23547268 -1.14988495 -1.05423666
## gear 0.74423325 0.89920190 1.06770954 1.24834316 1.44121560
## carb 4.08982526 4.30726356 4.52790832 4.75165737 4.97765285
##
## (Intercept) 191.55740418 192.27205646 192.87068175 193.3486372 193.70201849
## (Intercept) . . . . .
## mpg -1.09069715 -1.12441386 -1.15714575 -1.1888133 -1.21935425
## cyl 4.08066812 4.21684822 4.35094621 4.4829049 4.61276145
## disp 0.05744651 0.05963714 0.06185094 0.0640926 0.06636808
## drat -4.76332476 -4.79250509 -4.81065005 -4.8183854 -4.81645677
## wt 5.10228636 5.22585710 5.34192155 5.4500689 5.54987446
## qsec -4.27116187 -4.44413131 -4.61546822 -4.7846347 -4.95112797
## vsstraight -11.04509113 -11.21569516 -11.34722443 -11.4364772 -11.48035180
## ammanual -0.94934355 -0.83614126 -0.71563818 -0.5888631 -0.45681260
## gear 1.64580058 1.86146804 2.08750676 2.3231461 2.56757256
## carb 5.20524630 5.43378140 5.66259849 5.8910395 6.11845342
##
## (Intercept) 193.92109327 194.02191854 193.99443867 193.83803899 193.55250744
## (Intercept) . . . . .
## mpg -1.24958628 -1.27788644 -1.30490303 -1.33062218 -1.35504969
## cyl 4.74226023 4.86795813 4.99181549 5.11416174 5.23541158
## disp 0.06868099 0.07103425 0.07344181 0.07591337 0.07845923
## drat -4.80231755 -4.78166007 -4.75381589 -4.71975077 -4.68036600
## wt 5.63773425 5.71873939 5.79011555 5.85128880 5.90155881
## qsec -5.11376309 -5.27347084 -5.42934363 -5.58107857 -5.72840684
## vsstraight -11.47302080 -11.41796733 -11.30981867 -11.14626443 -10.92520896
## ammanual -0.31738503 -0.17801718 -0.03650641 0.10649780 0.25054535
## gear 2.81940382 3.07777793 3.34200895 3.61120714 3.88448289
## carb 6.34395738 6.56769239 6.78871127 7.00648413 7.22052318
##
## (Intercept) 193.13822267 192.6235899 191.97468050 191.20396732 190.31482894
## (Intercept) . . . . .
## mpg -1.37821255 -1.4008257 -1.42127520 -1.44048786 -1.45850504
## cyl 5.35603341 5.4750119 5.59408917 5.71384674 5.83469984
## disp 0.08108964 0.0837654 0.08657918 0.08950903 0.09256335
## drat -4.63640053 -4.5807940 -4.52925388 -4.47544037 -4.41960724
## wt 5.94007224 5.9632788 5.97688283 5.97616013 5.95994817
## qsec -5.87108814 -6.0086674 -6.14204537 -6.27041561 -6.39366861
## vsstraight -10.64485290 -10.3068884 -9.90732793 -9.44571180 -8.92205281
## ammanual 0.39534665 0.5381364 0.68087323 0.82389969 0.96721226
## gear 4.16089806 4.4347382 4.71317849 4.99197107 5.27000787
## carb 7.43040383 7.6368820 7.83802531 8.03416632 8.22521310
##
## (Intercept) 189.37257716 188.27470426 187.0687536 185.7595365 184.3519195
## (Intercept) . . . . .
## mpg -1.47534918 -1.49059909 -1.5047988 -1.5179336 -1.5300333
## cyl 5.95246313 6.07499424 6.1998551 6.3271725 6.4570548
## disp 0.09568011 0.09900362 0.1024771 0.1061069 0.1098975
## drat -4.35573114 -4.29876645 -4.2405424 -4.1813575 -4.1211935
## wt 5.92841217 5.88042664 5.8132188 5.7257870 5.6169480
## qsec -6.51277361 -6.62636283 -6.7345252 -6.8372005 -6.9342614
## vsstraight -8.34612170 -7.70301477 -7.0004500 -6.2407065 -5.4265766
## ammanual 1.10220811 1.24363242 1.3861696 1.5297716 1.6744514
## gear 5.53991500 5.81326972 6.0828195 6.3475418 6.6062949
## carb 8.41308865 8.59421180 8.7703459 8.9417301 9.1086946
##
## (Intercept) 182.9558484 181.3598369 179.6836123 178.0286143 176.1687057
## (Intercept) . . . . .
## mpg -1.5394633 -1.5491794 -1.5578918 -1.5634790 -1.5702765
## cyl 6.5805339 6.7155438 6.8525651 6.9825160 7.1250080
## disp 0.1137919 0.1179382 0.1222511 0.1267023 0.1313963
## drat -4.0637809 -4.0063802 -3.9480316 -3.8982249 -3.8430090
## wt 5.4966454 5.3437245 5.1664717 4.9777560 4.7484732
## qsec -7.0295272 -7.1152723 -7.1949698 -7.2729658 -7.3390116
## vsstraight -4.5761659 -3.6617589 -2.7048756 -1.7222137 -0.6883898
## ammanual 1.8047943 1.9531098 2.1018573 2.2371760 2.3929331
## gear 6.8544835 7.1008229 7.3378455 7.5648486 7.7848551
## carb 9.2735649 9.4323208 9.5881199 9.7425576 9.8929095
##
## (Intercept) 174.2374998 172.3242459 170.1250784 167.9465417 165.769300
## (Intercept) . . . . .
## mpg -1.5760497 -1.5785075 -1.5859257 -1.5890722 -1.589213
## cyl 7.2684806 7.4041494 7.5607698 7.7070944 7.845097
## disp 0.1362636 0.1413014 0.1465318 0.1519511 0.157590
## drat -3.7882349 -3.7471821 -3.6774904 -3.6284684 -3.599619
## wt 4.4927363 4.2244414 3.8922704 3.5514111 3.194440
## qsec -7.3981154 -7.4546376 -7.4913174 -7.5256794 -7.555251
## vsstraight 0.3730632 1.4481843 2.5627107 3.6790341 4.799234
## ammanual 2.5490449 2.6942526 2.8706812 3.0338030 3.193619
## gear 7.9937086 8.1937352 8.3758693 8.5520025 8.721551
## carb 10.0419624 10.1906801 10.3379391 10.4856741 10.634105
##
## (Intercept) 163.4428247 161.0919828 158.6210382 156.1186655 153.5193462
## (Intercept) . . . . .
## mpg -1.5924349 -1.5919122 -1.5946362 -1.5941659 -1.5969457
## cyl 7.9915465 8.1285241 8.2710823 8.4035649 8.5389892
## disp 0.1633429 0.1693553 0.1754512 0.1818056 0.1882199
## drat -3.5487641 -3.5257348 -3.4805817 -3.4625579 -3.4239166
## wt 2.7873116 2.3662043 1.8958211 1.4084889 0.8742336
## qsec -7.5689594 -7.5771660 -7.5691322 -7.5538718 -7.5223027
## vsstraight 5.9235270 7.0437772 8.1532667 9.2494730 10.3222755
## ammanual 3.3652046 3.5343900 3.7125781 3.8899779 4.0740702
## gear 8.8691874 9.0153281 9.1375263 9.2578672 9.3540732
## carb 10.7846319 10.9362100 11.0913373 11.2486117 11.4103177
##
## (Intercept) 150.8886940 148.1876123 145.4197146
## (Intercept) . . .
## mpg -1.5971154 -1.6004930 -1.6039712
## cyl 8.6634640 8.7882784 8.9076274
## disp 0.1948860 0.2015858 0.2084335
## drat -3.4114081 -3.3800171 -3.3550844
## wt 0.3206107 -0.2763911 -0.9026725
## qsec -7.4820869 -7.4259810 -7.3573035
## vsstraight 11.3731550 12.3898058 13.3744893
## ammanual 4.2584282 4.4469797 4.6382009
## gear 9.4478774 9.5179478 9.5782001
## carb 11.5751705 11.7449431 11.9185074
We can see that the resulting output holds the coefficients of the 12 parameters (?) in the model for each of the 100 values of lambda
that we saw already in the output of the ridge
object. The extra parameter is the result of the model matrix that we entered as the predictor space x
: it holds an intercept (i.e. a column of 1
’s) while the function glmnet()
will by default add an intercept too. The redundant extra Intercept
is set to zero.
summary()
of the ridge regressionThere is no summary output. At least not as we know it. The reason is: the model is not fit to generate one specific scenario. For different values of the shrinkage/penalization parameter \(\lambda\), different results are obtained.
ridge
twice: once with the deviance on the x-axis and once with the log of lambda on the x-axis. Hint: see ?plot.glmnet
.Let’s first plot the trace of the coefficients with the natural logarithm of lambda
.
plot(ridge, xvar = "lambda")
We can see that with increasing lambda
, the parameters will shrink towards zero. Note that \(\ln(\lambda) = 10\) would correspond to a lambda
of 2.2026466^{4}.
plot(ridge, xvar = "dev")
We can see that with increasing dev
, the parameters will move away frome zero.
The lesson is that there is an optimal value for \(\lambda\), where most
ridge <- cv.glmnet(x = x[, -1], y = y,
family = "gaussian",
alpha = 0,
standardize = TRUE)
I choose to remove the intercept by excluding the first column ([, -1]
): glmnet()
will by default add an intercept already and otherwise I’ll have the redundant parameter that we saw before.
cv.ridge
.cv.ridge <- cv.glmnet(x = x[, -1], y = y,
family = "gaussian",
alpha = 0,
standardize = TRUE)
cv.ridge
object and run the object through plot()
.cv.ridge
##
## Call: cv.glmnet(x = x[, -1], y = y, family = "gaussian", alpha = 0, standardize = TRUE)
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 20.66 86 1041 351.5 10
## 1se 145.78 65 1384 583.0 10
There are two optimal parameters for \(\lambda\) given, that both yield 9
nonzero coefficients:
plot(cv.ridge)
The vertical lines dashed lines show the locations of \(\lambda_{min}\) and \(\lambda_{1se}\).
The 1-standard-error-rule acknowledges the fact that the risk curves are estimated (and have error). It therefore favors parsimony. In other words: a simpler model that yields about the same predictive power as the one under \(\lambda_{min}\). The motivation for \(\lambda_{1se}\) originated in the 1984 book Classification and regression trees by Breiman, Friedman, Olshen and Stone and can be found in chapter 3, paragraph 4.3.
First we obtain the linear model predictions (fitted values)
pred.lm <- predict(fit)
and then we generate the predicted values for the ridge regression under the \(\lambda_{1se}\) model:
pred.ridge <- predict(cv.ridge, s = "lambda.1se", newx = x[, -1])
Using the caret
function postResample()
we obtain the following performance measures.
caret::postResample(pred.lm, y)
## RMSE Rsquared MAE
## 21.0392190 0.9027993 18.8429652
caret::postResample(pred.ridge, y)
## RMSE Rsquared MAE
## 33.6481538 0.8323334 23.0017267
The linear model performs better than the ridge regression. However, we do not know how well these model fit on new predictions. Simply verifying the predicted values on the data the model has been trained on would be faulty: the linear model tends to fit better in that scenario as it yields unbiased parameter estimates. In conclusion; for inference purposes, the linear model may be fine. However, for prediction purposes it may overfit the data.
I have chosen to fit 4 folds in this exercise because the sample size has decreased because of the train/test set. First, we create the train/test split.
idx <- createDataPartition(mtc$hp, p = .7, list = FALSE)
train <- mtc[idx, ]
test <- mtc[-idx, ]
Then we refit the linear model to the training cases. We store the predicted values on the test data in object pred
.
fit <- train %$% lm(hp ~ ., data = .)
pred <- predict(fit, newdata = test)
For ridge regression, we do the same. First we seperate the training response in vector y
and we obtain the model matrix x
from the fitted linear model (the one on the training data, naturally).
y <- train$hp
x <- model.matrix(fit)[, -1] #exclude intercept
Then we run the cv.glmnet()
method to infer optimal values of lambda through cross-validation.
ridge <- cv.glmnet(x = x, y = y,
family = "gaussian",
alpha = 0,
standardize = TRUE,
nfolds = 4)
Finally we obtain the predicted values. To do so, I use a quick linear model on the test data to obtain the corresponding design matrix. I do this to avoid an annoying matrix property: if I would convert the test object to a matrix, every column would turn to character class. That is because matrices can be either numeric or character; not both. predict.glmnet()
requires the input for argument newx
to be a matrix.
newx <- test %$% lm(hp ~ ., data = .) %>% model.matrix() %>% .[, -1]
ridge_min <- predict(ridge, s = "lambda.min", newx = newx)
ridge_1se <- predict(ridge, s = "lambda.1se", newx = newx)
Now, let’s compare the three solutions:
postResample(pred, test$hp)
## RMSE Rsquared MAE
## 36.5474077 0.7369976 32.7630202
postResample(ridge_min, test$hp)
## RMSE Rsquared MAE
## 26.6839013 0.8118065 25.2053081
postResample(ridge_1se, test$hp)
## RMSE Rsquared MAE
## 29.517895 0.853422 23.769065
We can see that both ridge predictions outperform the linear model predictions, with the more regularized \(\lambda_{1se}\) solution yielding the best predictive performance.
Important Note: We have used both a test/train split and a k-fold cross-validation procedure. A different seed value will result in different splits and cross-validation folds. Don’t forget that once you fix the seed, everything will become seed dependent.
Re-running it all again yields:
## RMSE Rsquared MAE
## 30.8110303 0.8112899 26.2250707
## RMSE Rsquared MAE
## 28.1841095 0.8428196 22.6240256
## RMSE Rsquared MAE
## 29.9926533 0.8708709 22.9344999
End of Practical