mice
: An approach to
sensitivity analysisThis is the last practical in the series.
The focus of this document is on sensitivity analysis in the context of missing data. The goal of sensitivity analysis is to study the influence that violations of the missingness assumptions have on the obtained inference.
All the best,
The leiden
data set is a subset of 956 members of a very
old (85+) cohort in Leiden. Multiple imputation of this data set has
been described in Boshuizen
et al (1998), Van
Buuren et al (1999) and paragraph
9.2 in FIMD v2
The main question is how blood pressure affects mortality risk in the oldest old. We have reasons to mistrust the MAR assumption in this case. In particular, we worried whether the imputations of blood pressure under MAR would be low enough. The sensitivity analysis explores the effect of artificially lowering the imputed blood pressure by deducting an amount \(\delta\) from the values imputed under MAR. In order to preserve the relations between the variables, this needs to be done during the iterations.
The leiden
data set can be obtained by loading the
following connection in R
:
load(url("https://www.gerkovink.com/mimp/leiden.RData"))
1. Open R
, install the package
survminer
, purrr
and patchwork
from CRAN if you don’t have these packages yet, and load the packages
mice
and survival
, survminer
and
fix the random seed.
install.packages("survminer")
install.packages("purrr")
install.packages("patchwork")
set.seed(123)
library(mice)
library(survival)
library(ggmice)
library(ggplot2)
library(purrr)
library(dplyr)
library(patchwork)
We choose seed value 123
. This is an arbitrary value;
any value would be an equally good seed value. Fixing the random seed
enables you (and others) to exactly replicate anything that involves
random number generators. If you set the seed in your R
instance to 123
, you will get the exact same results and
plots as we present in this document if you follow the order and code of
the exercises precisely.
2. The leiden data set.
summary(leiden)
## sexe lftanam rrsyst rrdiast
## Min. :0.0000 Min. : 85.48 Min. : 90.0 Min. : 50.00
## 1st Qu.:0.0000 1st Qu.: 87.50 1st Qu.:135.0 1st Qu.: 75.00
## Median :0.0000 Median : 89.07 Median :150.0 Median : 80.00
## Mean :0.2709 Mean : 89.78 Mean :152.9 Mean : 82.78
## 3rd Qu.:1.0000 3rd Qu.: 91.52 3rd Qu.:170.0 3rd Qu.: 90.00
## Max. :1.0000 Max. :103.54 Max. :260.0 Max. :140.00
## NA's :121 NA's :126
## dwa survda alb chol
## Min. :0.0000 Min. : 2.0 Min. :21.00 Min. : 2.700
## 1st Qu.:0.0000 1st Qu.: 534.8 1st Qu.:39.00 1st Qu.: 4.800
## Median :0.0000 Median :1196.5 Median :41.00 Median : 5.700
## Mean :0.2437 Mean :1195.4 Mean :40.77 Mean : 5.704
## 3rd Qu.:0.0000 3rd Qu.:1889.0 3rd Qu.:43.00 3rd Qu.: 6.400
## Max. :1.0000 Max. :2610.0 Max. :52.00 Max. :10.900
## NA's :229 NA's :232
## mmse woon
## Min. : 1.00 Min. :0.000
## 1st Qu.:21.00 1st Qu.:0.000
## Median :26.00 Median :3.000
## Mean :23.67 Mean :1.775
## 3rd Qu.:29.00 3rd Qu.:3.000
## Max. :30.00 Max. :4.000
## NA's :85
str(leiden)
## 'data.frame': 956 obs. of 10 variables:
## $ sexe : num 0 0 0 0 0 0 0 1 1 0 ...
## $ lftanam: num 87.8 87.8 89.1 90.3 87.8 ...
## $ rrsyst : num 160 140 155 155 110 120 180 135 130 160 ...
## $ rrdiast: num 100 70 85 90 60 80 75 80 60 90 ...
## $ dwa : num 0 0 0 0 0 0 0 0 0 0 ...
## $ survda : num 1082 27 1604 528 1100 ...
## $ alb : num 41 NA 41 44 37 NA 42 NA 45 46 ...
## $ chol : num 4.4 NA 4.6 3.9 5.3 NA 7.2 NA 5.1 6.5 ...
## $ mmse : num 12 9 25 27 14 NA 28 26 30 14 ...
## $ woon : num 4 3 0 1 0 3 3 0 4 4 ...
head(leiden)
## sexe lftanam rrsyst rrdiast dwa survda alb chol mmse woon
## 1 0 87.80 160 100 0 1082 41 4.4 12 4
## 2 0 87.75 140 70 0 27 NA NA 9 3
## 3 0 89.08 155 85 0 1604 41 4.6 25 0
## 4 0 90.29 155 90 0 528 44 3.9 27 1
## 5 0 87.76 110 60 0 1100 37 5.3 14 0
## 6 0 91.39 120 80 0 5 NA NA NA 3
tail(leiden)
## sexe lftanam rrsyst rrdiast dwa survda alb chol mmse woon
## 1229 1 93.85 130 70 0 523 40 5.3 28 0
## 1230 0 92.20 190 90 0 1182 44 5.8 26 3
## 1232 0 95.02 150 80 0 861 35 5.0 28 0
## 1233 0 88.30 120 60 0 129 42 8.6 21 0
## 1235 1 89.02 140 80 0 374 40 5.2 23 0
## 1236 0 85.70 130 65 0 1744 36 7.2 27 3
3. Perform a dry run (using maxit = 0
) in
mice
. List the number of missing values per
variable.
ini <- mice(leiden, maxit = 0)
ini$nmis
## sexe lftanam rrsyst rrdiast dwa survda alb chol mmse woon
## 0 0 121 126 0 0 229 232 85 0
There are 121 missings (NA
’s) for rrsyst
,
126 missings for rrdiast
, 229 missings for
alb
, 232 missings for chol
and 85 missing
values for mmse
.
4. Study the missing data pattern in more detail using
md.pattern()
and fluxplot()
. The interest here
focusses on imputing systolic blood pressure (rrsyst
) and
diastolic blood pressure (rrdiast
).
plot_pattern(leiden)
plot_flux(leiden)
Variables with higher outflux are (potentially) the more powerful predictors. Variables with higher influx depend stronger on the imputation model. When points are relatively close to the diagonal, it indicates that influx and outflux are balanced.
The variables in the upper left corner have the more complete information, so the number of missing data problems for this group is relatively small. The variables in the middle have an outflux between 0.5 and 0.8, which is small. Missing data problems are thus more severe, but potentially this group could also contain important variables. The lower (bottom) variables have an outflux with 0.5 or lower, so their predictive power is limited. Also, this group has a higher influx, and, thus, depend more highly on the imputation model.
If you’d like this information in tabulated form, you can simply ask
flux(leiden)
## pobs influx outflux ainb aout fico
## sexe 1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## lftanam 1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## rrsyst 0.8734310 0.09798107 0.5573770 0.7887971 0.05881570 0.2562874
## rrdiast 0.8682008 0.10231550 0.5422446 0.7910053 0.05756359 0.2518072
## dwa 1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## survda 1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## alb 0.7604603 0.19311053 0.2471627 0.8214459 0.02995568 0.1458047
## chol 0.7573222 0.19573400 0.2383354 0.8218391 0.02900552 0.1422652
## mmse 0.9110879 0.06798221 0.6796974 0.7790850 0.06875877 0.2870264
## woon 1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
5. The cases with and without blood pressure observed have very different survival rates. Show this.
We can see this easily from the Kaplan-Meier plot.
km <- survfit(Surv(survda/365, 1-dwa) ~ is.na(rrsyst), data = leiden)
survminer::ggsurvplot(km,
censor.shape="",
palette = c(mdc(4), mdc(5)),
ggtheme = ggmice:::theme_mice(),
legend.labs = c("BP Observed", "BP Missing"),
xlab = "Years since intake",
ylab = "K-M survival probability")
In the next steps we are going to impute rrsyst
and
rrdiast
under two scenarios: MAR and MNAR. We will use the
delta adjustment technique described in paragraph 7.2.3 in Van Buuren
(2012)
6. Create a \(\delta\) vector that represent the following adjustment values for mmHg: 0 for MAR, and -5, -10, -15, and -20 for MNAR.
delta <- c(0, -5, -10, -15, -20)
The recipe for creating MNAR imputations for \(\delta \neq 0\) uses the post-processing facility of mice. This allows to change the imputations on the fly by deducting a value of \(\delta\) from the values just imputed.
7. Impute the leiden data using the delta adjustment
technique. We only have to deduct from rrsyst
, because
rrdiast
will adapt to the changed rrsyst
when
it is imputed using rrsyst
as predictor. Store the five
imputed scenarios (adjustment) in a list called
imp.all
.
imp.all <- list()
post <- ini$post
for (i in 1:length(delta)){
d <- delta[i]
cmd <- paste("imp[[j]][,i] <- imp[[j]][,i] +", d)
post["rrsyst"] <- cmd
imp <- mice(leiden, post = post, maxit = 5, seed = i, print = FALSE)
imp.all[[i]] <- imp
}
8. Inspect the imputations. Compare the imputations for blood pressure under the most extreme scenarios with a box-and-whiskers plot. Is this as expected?
For the scenario where \(\delta =
0\) we can plot the first object from the list. This object is
the mids
-object that considers imputations under no
adjustment.
mice::bwplot(imp.all[[1]])
With ggmice
, it is less straightforward to create this
plot, because it can only handle a single variable per aesthetic at the
time. This means that to create multiple boxplots, we have to create
multiple plots and stitch them together, using, for example,
patchwork::wrap_plots()
. Because the plots are created
independently, the cleanest solution is obtained by only plotting those
variables that suffer from missingness. If we would include all
variables (also those without missingness), the number of imputations on
the x-axis would differ between \(0\)
for completely observed variables and \(m\) for the imputed variables.
The plot can be obtained like this:
leiden %>%
select(where(~ sum(is.na(.x)) > 0)) %>%
colnames %>%
map(~ ggmice(imp.all[[1]],
mapping = aes_string(x = '.imp', y = .x)) +
geom_boxplot() +
theme(legend.position = 'none')) %>%
wrap_plots()
For the scenario where \(\delta =
-20\) we can plot the fifth object from the list. This object is
the mids
-object that considers imputations under the
largest adjustment.
mice::bwplot(imp.all[[5]])
We can clearly see that the adjustment has an effect on the
imputations for rrsyst
and, thus, on those for
rrdiast
.
A similar figure can be obtained using ggmice
, like
this:
leiden %>%
select(where(~ sum(is.na(.x)) > 0)) %>%
colnames %>%
map(~ ggmice(imp.all[[5]],
mapping = aes_string(x = '.imp', y = .x)) +
geom_boxplot() +
theme(legend.position = 'none')) %>%
wrap_plots()
9. Use the density plot for another inspection.
For the scenario where \(\delta =
0\) we can plot the first listed index from the list
imp.all
. This object is the mids
-object that
considers imputations under no adjustment.
mice::densityplot(imp.all[[1]], lwd = 2)
A similar figure can be obtained via ggmice
:
leiden %>%
select(where(~ sum(is.na(.x)) > 0)) %>%
colnames %>%
map(~ ggmice(imp.all[[1]],
mapping = aes_string(x = .x, group = '.imp')) +
geom_density() +
theme(legend.position = 'none')) %>%
wrap_plots()
For the scenario where \(\delta =
-20\) we can plot the fifth listed index from the list
imp.all
. This object is the mids
-object that
considers imputations under the largest adjustment.
mice::densityplot(imp.all[[5]], lwd = 2)
We can once more clearly see that the adjustment has an effect on the
imputations for rrsyst
and, thus, on those for
rrdiast
.
Doing the same with ggmice
yields
leiden %>%
select(where(~ sum(is.na(.x)) > 0)) %>%
colnames %>%
map(~ ggmice(imp.all[[5]],
mapping = aes_string(x = .x, group = '.imp')) +
geom_density() +
theme(legend.position = 'none')) %>%
wrap_plots()
10. Also create a scatter plot of rrsyst
and
rrdiast
by imputation number and missingness.
mice::xyplot(imp.all[[1]], rrsyst ~ rrdiast | .imp)
mice::xyplot(imp.all[[5]], rrsyst ~ rrdiast | .imp)
The scatter plot comparison between rrsyst
and
rrdiast
shows us that the adjustment has an effect on the
imputations and that the imputations are lower for the situation where
\(\delta = -20\).
Again, similar figures can be obtained using ggmice
,
although the imputations are shown in separate windows.
ggmice(imp.all[[1]],
mapping = aes(x = rrdiast, y = rrsyst)) +
geom_point() +
facet_wrap(~.imp)
ggmice(imp.all[[5]],
mapping = aes(x = rrdiast, y = rrsyst)) +
geom_point() +
facet_wrap(~.imp)
We are now going to perform a complete-data analysis. This involves several steps:
In order to automate this step we should create an expression object that performs these stepd for us. The following object does so:
cda <- expression(
sbpgp <- cut(rrsyst, breaks = c(50, 124, 144, 164, 184, 200, 500)),
agegp <- cut(lftanam, breaks = c(85, 90, 95, 110)),
dead <- 1 - dwa,
coxph(Surv(survda, dead)
~ C(sbpgp, contr.treatment(6, base = 3))
+ strata(sexe, agegp)))
See Van Buuren (2012, pp.186) for more information.
11. Create five fit objects that run the expression
cda
on the five imputed adjustment scenarios. Use function
with().
fit1 <- with(imp.all[[1]], cda)
fit2 <- with(imp.all[[2]], cda)
fit3 <- with(imp.all[[3]], cda)
fit4 <- with(imp.all[[4]], cda)
fit5 <- with(imp.all[[5]], cda)
Each fit object contains the five imputed Cox proportional hazards models for the adjustment scenario at hand. For example, the \(\delta=-10\) scenario is contained in fit3.
fit3
## call :
## with.mids(data = imp.all[[3]], expr = cda)
##
## call1 :
## mice(data = leiden, post = post, maxit = 5, printFlag = FALSE,
## seed = i)
##
## nmis :
## sexe lftanam rrsyst rrdiast dwa survda alb chol mmse woon
## 0 0 121 126 0 0 229 232 85 0
##
## analyses :
## [[1]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6,
## base = 3)) + strata(sexe, agegp))
##
## coef exp(coef) se(coef) z
## C(sbpgp, contr.treatment(6, base = 3))1 0.5357 1.7086 0.1171 4.575
## C(sbpgp, contr.treatment(6, base = 3))2 0.3998 1.4915 0.1026 3.896
## C(sbpgp, contr.treatment(6, base = 3))4 0.1122 1.1188 0.1173 0.957
## C(sbpgp, contr.treatment(6, base = 3))5 0.1150 1.1218 0.1463 0.786
## C(sbpgp, contr.treatment(6, base = 3))6 -0.1396 0.8697 0.2876 -0.485
## p
## C(sbpgp, contr.treatment(6, base = 3))1 4.77e-06
## C(sbpgp, contr.treatment(6, base = 3))2 9.76e-05
## C(sbpgp, contr.treatment(6, base = 3))4 0.339
## C(sbpgp, contr.treatment(6, base = 3))5 0.432
## C(sbpgp, contr.treatment(6, base = 3))6 0.628
##
## Likelihood ratio test=30.25 on 5 df, p=1.315e-05
## n= 956, number of events= 723
##
## [[2]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6,
## base = 3)) + strata(sexe, agegp))
##
## coef exp(coef) se(coef) z
## C(sbpgp, contr.treatment(6, base = 3))1 0.47629 1.61008 0.11871 4.012
## C(sbpgp, contr.treatment(6, base = 3))2 0.35318 1.42358 0.10127 3.487
## C(sbpgp, contr.treatment(6, base = 3))4 0.07637 1.07936 0.11755 0.650
## C(sbpgp, contr.treatment(6, base = 3))5 0.07489 1.07777 0.14472 0.518
## C(sbpgp, contr.treatment(6, base = 3))6 -0.13686 0.87209 0.27758 -0.493
## p
## C(sbpgp, contr.treatment(6, base = 3))1 6.02e-05
## C(sbpgp, contr.treatment(6, base = 3))2 0.000488
## C(sbpgp, contr.treatment(6, base = 3))4 0.515888
## C(sbpgp, contr.treatment(6, base = 3))5 0.604799
## C(sbpgp, contr.treatment(6, base = 3))6 0.621966
##
## Likelihood ratio test=24.59 on 5 df, p=0.0001674
## n= 956, number of events= 723
##
## [[3]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6,
## base = 3)) + strata(sexe, agegp))
##
## coef exp(coef) se(coef) z
## C(sbpgp, contr.treatment(6, base = 3))1 0.6195 1.8579 0.1169 5.298
## C(sbpgp, contr.treatment(6, base = 3))2 0.3557 1.4272 0.1032 3.445
## C(sbpgp, contr.treatment(6, base = 3))4 0.0604 1.0623 0.1183 0.510
## C(sbpgp, contr.treatment(6, base = 3))5 0.1103 1.1166 0.1450 0.761
## C(sbpgp, contr.treatment(6, base = 3))6 -0.1507 0.8601 0.2877 -0.524
## p
## C(sbpgp, contr.treatment(6, base = 3))1 1.17e-07
## C(sbpgp, contr.treatment(6, base = 3))2 0.000571
## C(sbpgp, contr.treatment(6, base = 3))4 0.609778
## C(sbpgp, contr.treatment(6, base = 3))5 0.446955
## C(sbpgp, contr.treatment(6, base = 3))6 0.600324
##
## Likelihood ratio test=35.87 on 5 df, p=1.01e-06
## n= 956, number of events= 723
##
## [[4]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6,
## base = 3)) + strata(sexe, agegp))
##
## coef exp(coef) se(coef) z
## C(sbpgp, contr.treatment(6, base = 3))1 0.51999 1.68202 0.11625 4.473
## C(sbpgp, contr.treatment(6, base = 3))2 0.33831 1.40258 0.10369 3.263
## C(sbpgp, contr.treatment(6, base = 3))4 0.04763 1.04879 0.11704 0.407
## C(sbpgp, contr.treatment(6, base = 3))5 0.10874 1.11487 0.14579 0.746
## C(sbpgp, contr.treatment(6, base = 3))6 -0.16631 0.84679 0.28780 -0.578
## p
## C(sbpgp, contr.treatment(6, base = 3))1 7.72e-06
## C(sbpgp, contr.treatment(6, base = 3))2 0.0011
## C(sbpgp, contr.treatment(6, base = 3))4 0.6840
## C(sbpgp, contr.treatment(6, base = 3))5 0.4558
## C(sbpgp, contr.treatment(6, base = 3))6 0.5634
##
## Likelihood ratio test=28.19 on 5 df, p=3.338e-05
## n= 956, number of events= 723
##
## [[5]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6,
## base = 3)) + strata(sexe, agegp))
##
## coef exp(coef) se(coef) z
## C(sbpgp, contr.treatment(6, base = 3))1 0.62074 1.86031 0.11484 5.405
## C(sbpgp, contr.treatment(6, base = 3))2 0.38188 1.46504 0.10358 3.687
## C(sbpgp, contr.treatment(6, base = 3))4 0.03305 1.03360 0.11994 0.276
## C(sbpgp, contr.treatment(6, base = 3))5 0.13979 1.15003 0.14650 0.954
## C(sbpgp, contr.treatment(6, base = 3))6 -0.04512 0.95588 0.26928 -0.168
## p
## C(sbpgp, contr.treatment(6, base = 3))1 6.47e-08
## C(sbpgp, contr.treatment(6, base = 3))2 0.000227
## C(sbpgp, contr.treatment(6, base = 3))4 0.782896
## C(sbpgp, contr.treatment(6, base = 3))5 0.339992
## C(sbpgp, contr.treatment(6, base = 3))6 0.866916
##
## Likelihood ratio test=38.51 on 5 df, p=2.985e-07
## n= 956, number of events= 723
12. Pool the results for each of the five scenarios.
r1 <- as.vector(t(exp(summary(pool(fit1))[, c(2)])))
r2 <- as.vector(t(exp(summary(pool(fit2))[, c(2)])))
r3 <- as.vector(t(exp(summary(pool(fit3))[, c(2)])))
r4 <- as.vector(t(exp(summary(pool(fit4))[, c(2)])))
r5 <- as.vector(t(exp(summary(pool(fit5))[, c(2)])))
summary(pool(fit1))
## term estimate std.error statistic
## 1 C(sbpgp, contr.treatment(6, base = 3))1 0.51894998 0.1270346 4.0851076
## 2 C(sbpgp, contr.treatment(6, base = 3))2 0.33980528 0.1076749 3.1558438
## 3 C(sbpgp, contr.treatment(6, base = 3))4 0.07075033 0.1242191 0.5695608
## 4 C(sbpgp, contr.treatment(6, base = 3))5 0.06781262 0.1504278 0.4507986
## 5 C(sbpgp, contr.treatment(6, base = 3))6 -0.10245227 0.2836645 -0.3611741
## df p.value
## 1 340.6491 5.497273e-05
## 2 219.7215 1.824546e-03
## 3 165.4914 5.697479e-01
## 4 196.1717 6.526326e-01
## 5 302.7843 7.182212e-01
This code grabs the information from the tabulated pooled results that are produced by summary. In order to make sense about these numbers, and to see what exactly is extracted in the above code, laying out the numbers in a proper table may be useful.
pars <- matrix(c(r1,r2,r3,r4,r5), nrow = 5) %>%
t() %>%
round(digits = 2)
pars
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.68 1.40 1.07 1.07 0.90
## [2,] 1.70 1.44 1.08 1.07 0.88
## [3,] 1.74 1.44 1.07 1.12 0.88
## [4,] 1.69 1.44 1.09 1.11 0.85
## [5,] 1.76 1.45 1.10 1.13 0.88
pars <- pars[, c(1, 2, 5)]
dimnames(pars) <- list(delta, c("<125", "125-144", ">200"))
pars
## <125 125-144 >200
## 0 1.68 1.40 0.90
## -5 1.70 1.44 0.88
## -10 1.74 1.44 0.88
## -15 1.69 1.44 0.85
## -20 1.76 1.45 0.88
All in all, it seems that even big changes to the imputations (e.g. deducting 20 mmHg) has little influence on the results. This suggests that the results are stable relatively to this type of MNAR-mechanism.
13. Perform sensitivity analysis analysis on the mammalsleep dataset by adding and subtracting some amount from the imputed values for sws. Use delta <- c(8, 6, 4, 2, 0, -2, -4, -6, -8) and investigating the influence on the following regression model:
lm(sws ~ log10(bw) + odi)
Sensitivity analysis is an important tool for investigating the plausibility of the MAR assumption. We again use the \(\delta\)-adjustment technique described in Van Buuren (2012, p. 185) as an informal, simple and direct method to create imputations under nonignorable models. We do so by simply adding and substracting some amount from the imputations.
delta <- c(8, 6, 4, 2, 0, -2, -4, -6, -8)
meth<- make.method(mammalsleep)
meth["ts"]<- "~ I(sws + ps)"
pred <- make.predictorMatrix(mammalsleep)
pred[c("sws", "ps"), "ts"] <- 0
pred[, "species"] <- 0
post <- make.post(mammalsleep)
imputations <- list()
for (i in 1:length(delta)) {
d <- delta[i]
post["sws"] <- cmd
cmd <- paste("imp[[j]][, i] <- imp[[j]][, i] +", d)
imputations[[i]] <- mice(mammalsleep,
meth=meth,
pred=pred,
post = post,
maxit = 10,
seed = i * 22,
print = FALSE)
}
output <- sapply(imputations, function(x) pool(with(x, lm(sws ~ log10(bw) + odi)))$pooled$estimate)
result <- cbind(delta, as.data.frame(t(output)))
colnames(result) <- c("delta", "Intercept", "log10(bw)", "odi")
result
## delta Intercept log10(bw) odi
## 1 8 7.507252 -3.66795293 -0.5354851
## 2 6 13.567651 0.01713469 -1.2146387
## 3 4 13.201827 -0.20734455 -1.1732456
## 4 2 12.779407 -0.58037756 -1.1364257
## 5 0 12.364483 -0.72542116 -1.0985924
## 6 -2 11.755797 -1.10028455 -0.9627950
## 7 -4 11.529455 -1.33036140 -0.9906999
## 8 -6 10.892475 -1.63272780 -0.8554995
## 9 -8 10.396622 -1.82703458 -0.8500622
The estimates for different \(\delta\) are not close. A clear trend for
the estimates for the intercept
and for bw
emerges. Thus, the results are not essentially the same under all
specified mechanisms and the outcomes can be deemed sensitive to the
assumed mechanism.
However, in this scenario, the \(\delta\) adjustment is completely
unrealistic. If we look at the descriptive information for observed
sws
summary(mammalsleep$sws)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.100 6.250 8.350 8.673 11.000 17.900 14
we find that even our smallest adjustment, i.e. \(\delta = | 2 |\), already makes up almost a
quarter of the average sws
. Choosing unreasonably large
values may always influence your estimates. Therefore; choosing values
that are reasonable given your suspicions of an assumed breach of the
MAR assumption is vital.
We only used a shift parameter here. In other applications, scale or shape parameters could be more natural (see e.g. Van Buuren (2012), Ch. 3.9.1). The calculations are easily adapted to such cases.
We have seen that we can create multiple imputations in multivariate
missing data problems that imitate deviations from MAR. The analysis
used the post
argument of the mice()
function
as a hook to alter the imputations just after they have been created by
a univariate imputation function. The diagnostics shows that the trick
works. The relative mortality estimates are however robust to this type
of alteration.
Van Buuren, S. (2012), Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. ISBN 9781439868249. CRC Press, Amazon.
Boshuizen, H. C., Izaks, G. J., van Buuren, S., & Ligthart, G. J. (1998). Blood pressure and mortality in elderly people aged 85 and older: community based study. BMJ (Clinical research ed.), 316(7147), 1780–1784. doi:10.1136/bmj.316.7147.1780
- End of practical
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.1.1 dplyr_1.0.9 purrr_0.3.4 ggplot2_3.3.6
## [5] ggmice_0.0.1.9000 survival_3.3-1 mice_3.14.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 lattice_0.20-45 tidyr_1.2.0 zoo_1.8-10
## [5] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 R6_2.5.1
## [9] survminer_0.4.9 backports_1.4.1 evaluate_0.15 highr_0.9
## [13] pillar_1.7.0 rlang_1.0.3 rstudioapi_0.13 data.table_1.14.2
## [17] car_3.1-0 jquerylib_0.1.4 Matrix_1.4-1 rmarkdown_2.14
## [21] labeling_0.4.2 splines_4.2.0 stringr_1.4.0 munsell_0.5.0
## [25] broom_1.0.0 compiler_4.2.0 xfun_0.31 pkgconfig_2.0.3
## [29] htmltools_0.5.2 tidyselect_1.1.2 tibble_3.1.7 gridExtra_2.3
## [33] km.ci_0.5-6 fansi_1.0.3 crayon_1.5.1 withr_2.5.0
## [37] ggpubr_0.4.0 grid_4.2.0 jsonlite_1.8.0 xtable_1.8-4
## [41] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.3
## [45] KMsurv_0.1-5 scales_1.2.0 cli_3.3.0 stringi_1.7.6
## [49] carData_3.0-5 farver_2.1.1 ggsignif_0.6.3 bslib_0.3.1
## [53] ellipsis_0.3.2 survMisc_0.5.6 generics_0.1.3 vctrs_0.4.1
## [57] tools_4.2.0 glue_1.6.2 abind_1.4-5 fastmap_1.1.0
## [61] yaml_2.3.5 colorspace_2.0-3 rstatix_0.7.0 knitr_1.39
## [65] sass_0.4.1