This is the last practical in the series.

The focus of this document is on sensitivity analysis in the context of missing data. The goal of sensitivity analysis is to study the influence that violations of the missingness assumptions have on the obtained inference.

All the best,

Gerko and Stef

The Leiden data set

The leiden data set is a subset of 956 members of a very old (85+) cohort in Leiden. Multiple imputation of this data set has been described in Boshuizen et al (1998), Van Buuren et al (1999) and paragraph 9.2 in FIMD v2

The main question is how blood pressure affects mortality risk in the oldest old. We have reasons to mistrust the MAR assumption in this case. In particular, we worried whether the imputations of blood pressure under MAR would be low enough. The sensitivity analysis explores the effect of artificially lowering the imputed blood pressure by deducting an amount \(\delta\) from the values imputed under MAR. In order to preserve the relations between the variables, this needs to be done during the iterations.

The leiden data set can be obtained by loading the following connection in R:

load(url("https://www.gerkovink.com/mimp/leiden.RData"))

1. Open R, install the package survminer, purrr and patchwork from CRAN if you don’t have these packages yet, and load the packages mice and survival, survminer and fix the random seed.

install.packages("survminer")
install.packages("purrr")
install.packages("patchwork")

set.seed(123)
library(mice)
library(survival)
library(ggmice)
library(ggplot2)
library(purrr)
library(dplyr)
library(patchwork)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document if you follow the order and code of the exercises precisely.

2. The leiden data set.

summary(leiden)

##       sexe           lftanam           rrsyst         rrdiast      
##  Min.   :0.0000   Min.   : 85.48   Min.   : 90.0   Min.   : 50.00  
##  1st Qu.:0.0000   1st Qu.: 87.50   1st Qu.:135.0   1st Qu.: 75.00  
##  Median :0.0000   Median : 89.07   Median :150.0   Median : 80.00  
##  Mean   :0.2709   Mean   : 89.78   Mean   :152.9   Mean   : 82.78  
##  3rd Qu.:1.0000   3rd Qu.: 91.52   3rd Qu.:170.0   3rd Qu.: 90.00  
##  Max.   :1.0000   Max.   :103.54   Max.   :260.0   Max.   :140.00  
##                                    NA's   :121     NA's   :126     
##       dwa             survda            alb             chol       
##  Min.   :0.0000   Min.   :   2.0   Min.   :21.00   Min.   : 2.700  
##  1st Qu.:0.0000   1st Qu.: 534.8   1st Qu.:39.00   1st Qu.: 4.800  
##  Median :0.0000   Median :1196.5   Median :41.00   Median : 5.700  
##  Mean   :0.2437   Mean   :1195.4   Mean   :40.77   Mean   : 5.704  
##  3rd Qu.:0.0000   3rd Qu.:1889.0   3rd Qu.:43.00   3rd Qu.: 6.400  
##  Max.   :1.0000   Max.   :2610.0   Max.   :52.00   Max.   :10.900  
##                                    NA's   :229     NA's   :232     
##       mmse            woon      
##  Min.   : 1.00   Min.   :0.000  
##  1st Qu.:21.00   1st Qu.:0.000  
##  Median :26.00   Median :3.000  
##  Mean   :23.67   Mean   :1.775  
##  3rd Qu.:29.00   3rd Qu.:3.000  
##  Max.   :30.00   Max.   :4.000  
##  NA's   :85

str(leiden)

## 'data.frame':    956 obs. of  10 variables:
##  $ sexe   : num  0 0 0 0 0 0 0 1 1 0 ...
##  $ lftanam: num  87.8 87.8 89.1 90.3 87.8 ...
##  $ rrsyst : num  160 140 155 155 110 120 180 135 130 160 ...
##  $ rrdiast: num  100 70 85 90 60 80 75 80 60 90 ...
##  $ dwa    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ survda : num  1082 27 1604 528 1100 ...
##  $ alb    : num  41 NA 41 44 37 NA 42 NA 45 46 ...
##  $ chol   : num  4.4 NA 4.6 3.9 5.3 NA 7.2 NA 5.1 6.5 ...
##  $ mmse   : num  12 9 25 27 14 NA 28 26 30 14 ...
##  $ woon   : num  4 3 0 1 0 3 3 0 4 4 ...

head(leiden)

##   sexe lftanam rrsyst rrdiast dwa survda alb chol mmse woon
## 1    0   87.80    160     100   0   1082  41  4.4   12    4
## 2    0   87.75    140      70   0     27  NA   NA    9    3
## 3    0   89.08    155      85   0   1604  41  4.6   25    0
## 4    0   90.29    155      90   0    528  44  3.9   27    1
## 5    0   87.76    110      60   0   1100  37  5.3   14    0
## 6    0   91.39    120      80   0      5  NA   NA   NA    3

tail(leiden)

##      sexe lftanam rrsyst rrdiast dwa survda alb chol mmse woon
## 1229    1   93.85    130      70   0    523  40  5.3   28    0
## 1230    0   92.20    190      90   0   1182  44  5.8   26    3
## 1232    0   95.02    150      80   0    861  35  5.0   28    0
## 1233    0   88.30    120      60   0    129  42  8.6   21    0
## 1235    1   89.02    140      80   0    374  40  5.2   23    0
## 1236    0   85.70    130      65   0   1744  36  7.2   27    3

3. Perform a dry run (using maxit = 0) in mice. List the number of missing values per variable.

ini <- mice(leiden, maxit = 0)
ini$nmis

##    sexe lftanam  rrsyst rrdiast     dwa  survda     alb    chol    mmse    woon 
##       0       0     121     126       0       0     229     232      85       0

There are 121 missings (NA’s) for rrsyst, 126 missings for rrdiast, 229 missings for alb, 232 missings for chol and 85 missing values for mmse.

4. Study the missing data pattern in more detail using md.pattern() and fluxplot(). The interest here focusses on imputing systolic blood pressure (rrsyst) and diastolic blood pressure (rrdiast).

plot_pattern(leiden)

plot_flux(leiden)

Variables with higher outflux are (potentially) the more powerful predictors. Variables with higher influx depend stronger on the imputation model. When points are relatively close to the diagonal, it indicates that influx and outflux are balanced.

The variables in the upper left corner have the more complete information, so the number of missing data problems for this group is relatively small. The variables in the middle have an outflux between 0.5 and 0.8, which is small. Missing data problems are thus more severe, but potentially this group could also contain important variables. The lower (bottom) variables have an outflux with 0.5 or lower, so their predictive power is limited. Also, this group has a higher influx, and, thus, depend more highly on the imputation model.

If you’d like this information in tabulated form, you can simply ask

flux(leiden)

##              pobs     influx   outflux      ainb       aout      fico
## sexe    1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## lftanam 1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## rrsyst  0.8734310 0.09798107 0.5573770 0.7887971 0.05881570 0.2562874
## rrdiast 0.8682008 0.10231550 0.5422446 0.7910053 0.05756359 0.2518072
## dwa     1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## survda  1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184
## alb     0.7604603 0.19311053 0.2471627 0.8214459 0.02995568 0.1458047
## chol    0.7573222 0.19573400 0.2383354 0.8218391 0.02900552 0.1422652
## mmse    0.9110879 0.06798221 0.6796974 0.7790850 0.06875877 0.2870264
## woon    1.0000000 0.00000000 1.0000000 0.0000000 0.09216643 0.3504184

5. The cases with and without blood pressure observed have very different survival rates. Show this.

We can see this easily from the Kaplan-Meier plot.

km <- survfit(Surv(survda/365, 1-dwa) ~ is.na(rrsyst), data = leiden)

survminer::ggsurvplot(km, 
                      censor.shape="", 
                      palette = c(mdc(4), mdc(5)),
                      ggtheme = ggmice:::theme_mice(),
                      legend.labs = c("BP Observed", "BP Missing"),
                      xlab = "Years since intake",
                      ylab = "K-M survival probability")

In the next steps we are going to impute rrsyst and rrdiast under two scenarios: MAR and MNAR. We will use the delta adjustment technique described in paragraph 7.2.3 in Van Buuren (2012)

6. Create a \(\delta\) vector that represent the following adjustment values for mmHg: 0 for MAR, and -5, -10, -15, and -20 for MNAR.

delta <- c(0, -5, -10, -15, -20)

The recipe for creating MNAR imputations for \(\delta \neq 0\) uses the post-processing facility of mice. This allows to change the imputations on the fly by deducting a value of \(\delta\) from the values just imputed.

7. Impute the leiden data using the delta adjustment technique. We only have to deduct from rrsyst, because rrdiast will adapt to the changed rrsyst when it is imputed using rrsyst as predictor. Store the five imputed scenarios (adjustment) in a list called imp.all.

imp.all <- list()
post <- ini$post
for (i in 1:length(delta)){
  d <- delta[i]
  cmd <- paste("imp[[j]][,i] <- imp[[j]][,i] +", d)
  post["rrsyst"] <- cmd
  imp <- mice(leiden, post = post, maxit = 5, seed = i, print = FALSE)
  imp.all[[i]] <- imp
}

8. Inspect the imputations. Compare the imputations for blood pressure under the most extreme scenarios with a box-and-whiskers plot. Is this as expected?

For the scenario where \(\delta = 0\) we can plot the first object from the list. This object is the mids-object that considers imputations under no adjustment.

mice::bwplot(imp.all[[1]])

With ggmice, it is less straightforward to create this plot, because it can only handle a single variable per aesthetic at the time. This means that to create multiple boxplots, we have to create multiple plots and stitch them together, using, for example, patchwork::wrap_plots(). Because the plots are created independently, the cleanest solution is obtained by only plotting those variables that suffer from missingness. If we would include all variables (also those without missingness), the number of imputations on the x-axis would differ between \(0\) for completely observed variables and \(m\) for the imputed variables.

The plot can be obtained like this:

leiden %>%
  select(where(~ sum(is.na(.x)) > 0)) %>%
  colnames %>%
  map(~ ggmice(imp.all[[1]], 
               mapping = aes_string(x = '.imp', y = .x)) + 
        geom_boxplot() +
        theme(legend.position = 'none')) %>%
  wrap_plots()

For the scenario where \(\delta = -20\) we can plot the fifth object from the list. This object is the mids-object that considers imputations under the largest adjustment.

mice::bwplot(imp.all[[5]])

We can clearly see that the adjustment has an effect on the imputations for rrsyst and, thus, on those for rrdiast.

A similar figure can be obtained using ggmice, like this:

leiden %>%
  select(where(~ sum(is.na(.x)) > 0)) %>%
  colnames %>%
  map(~ ggmice(imp.all[[5]], 
               mapping = aes_string(x = '.imp', y = .x)) + 
        geom_boxplot() +
        theme(legend.position = 'none')) %>%
  wrap_plots()

9. Use the density plot for another inspection.

For the scenario where \(\delta = 0\) we can plot the first listed index from the list imp.all. This object is the mids-object that considers imputations under no adjustment.

mice::densityplot(imp.all[[1]], lwd = 2)

A similar figure can be obtained via ggmice:

leiden %>%
  select(where(~ sum(is.na(.x)) > 0)) %>%
  colnames %>%
  map(~ ggmice(imp.all[[1]], 
               mapping = aes_string(x = .x, group = '.imp')) + 
        geom_density() +
        theme(legend.position = 'none')) %>%
  wrap_plots()

For the scenario where \(\delta = -20\) we can plot the fifth listed index from the list imp.all. This object is the mids-object that considers imputations under the largest adjustment.

mice::densityplot(imp.all[[5]], lwd = 2)

We can once more clearly see that the adjustment has an effect on the imputations for rrsyst and, thus, on those for rrdiast.

Doing the same with ggmice yields

leiden %>%
  select(where(~ sum(is.na(.x)) > 0)) %>%
  colnames %>%
  map(~ ggmice(imp.all[[5]], 
               mapping = aes_string(x = .x, group = '.imp')) + 
        geom_density() +
        theme(legend.position = 'none')) %>%
  wrap_plots()

10. Also create a scatter plot of rrsyst and rrdiast by imputation number and missingness.

mice::xyplot(imp.all[[1]], rrsyst ~ rrdiast | .imp)

mice::xyplot(imp.all[[5]], rrsyst ~ rrdiast | .imp)

The scatter plot comparison between rrsyst and rrdiast shows us that the adjustment has an effect on the imputations and that the imputations are lower for the situation where \(\delta = -20\).

Again, similar figures can be obtained using ggmice, although the imputations are shown in separate windows.

ggmice(imp.all[[1]], 
       mapping = aes(x = rrdiast, y = rrsyst)) +
  geom_point() +
  facet_wrap(~.imp)

ggmice(imp.all[[5]],
       mapping = aes(x = rrdiast, y = rrsyst)) +
  geom_point() +
  facet_wrap(~.imp)

We are now going to perform a complete-data analysis. This involves several steps:

Create two categorical variables sbpgp and agegp that divide the observations into groups based on, respectively, systolic blood pressure and age.
Calculate whether person died or not.
Fit a Cox proportional hazards model to estimate the relative mortality risk corrected for sex and age group.

In order to automate this step we should create an expression object that performs these stepd for us. The following object does so:

cda <- expression(
  sbpgp <- cut(rrsyst, breaks = c(50, 124, 144, 164, 184, 200, 500)),
  agegp <- cut(lftanam, breaks = c(85, 90, 95, 110)),
  dead  <- 1 - dwa,
  coxph(Surv(survda, dead) 
        ~ C(sbpgp, contr.treatment(6, base = 3)) 
        + strata(sexe, agegp)))

See Van Buuren (2012, pp.186) for more information.

11. Create five fit objects that run the expression cda on the five imputed adjustment scenarios. Use function with().

fit1 <- with(imp.all[[1]], cda)
fit2 <- with(imp.all[[2]], cda)
fit3 <- with(imp.all[[3]], cda)
fit4 <- with(imp.all[[4]], cda)
fit5 <- with(imp.all[[5]], cda)

Each fit object contains the five imputed Cox proportional hazards models for the adjustment scenario at hand. For example, the \(\delta=-10\) scenario is contained in fit3.

fit3

## call :
## with.mids(data = imp.all[[3]], expr = cda)
## 
## call1 :
## mice(data = leiden, post = post, maxit = 5, printFlag = FALSE, 
##     seed = i)
## 
## nmis :
##    sexe lftanam  rrsyst rrdiast     dwa  survda     alb    chol    mmse    woon 
##       0       0     121     126       0       0     229     232      85       0 
## 
## analyses :
## [[1]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6, 
##     base = 3)) + strata(sexe, agegp))
## 
##                                            coef exp(coef) se(coef)      z
## C(sbpgp, contr.treatment(6, base = 3))1  0.5357    1.7086   0.1171  4.575
## C(sbpgp, contr.treatment(6, base = 3))2  0.3998    1.4915   0.1026  3.896
## C(sbpgp, contr.treatment(6, base = 3))4  0.1122    1.1188   0.1173  0.957
## C(sbpgp, contr.treatment(6, base = 3))5  0.1150    1.1218   0.1463  0.786
## C(sbpgp, contr.treatment(6, base = 3))6 -0.1396    0.8697   0.2876 -0.485
##                                                p
## C(sbpgp, contr.treatment(6, base = 3))1 4.77e-06
## C(sbpgp, contr.treatment(6, base = 3))2 9.76e-05
## C(sbpgp, contr.treatment(6, base = 3))4    0.339
## C(sbpgp, contr.treatment(6, base = 3))5    0.432
## C(sbpgp, contr.treatment(6, base = 3))6    0.628
## 
## Likelihood ratio test=30.25  on 5 df, p=1.315e-05
## n= 956, number of events= 723 
## 
## [[2]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6, 
##     base = 3)) + strata(sexe, agegp))
## 
##                                             coef exp(coef) se(coef)      z
## C(sbpgp, contr.treatment(6, base = 3))1  0.47629   1.61008  0.11871  4.012
## C(sbpgp, contr.treatment(6, base = 3))2  0.35318   1.42358  0.10127  3.487
## C(sbpgp, contr.treatment(6, base = 3))4  0.07637   1.07936  0.11755  0.650
## C(sbpgp, contr.treatment(6, base = 3))5  0.07489   1.07777  0.14472  0.518
## C(sbpgp, contr.treatment(6, base = 3))6 -0.13686   0.87209  0.27758 -0.493
##                                                p
## C(sbpgp, contr.treatment(6, base = 3))1 6.02e-05
## C(sbpgp, contr.treatment(6, base = 3))2 0.000488
## C(sbpgp, contr.treatment(6, base = 3))4 0.515888
## C(sbpgp, contr.treatment(6, base = 3))5 0.604799
## C(sbpgp, contr.treatment(6, base = 3))6 0.621966
## 
## Likelihood ratio test=24.59  on 5 df, p=0.0001674
## n= 956, number of events= 723 
## 
## [[3]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6, 
##     base = 3)) + strata(sexe, agegp))
## 
##                                            coef exp(coef) se(coef)      z
## C(sbpgp, contr.treatment(6, base = 3))1  0.6195    1.8579   0.1169  5.298
## C(sbpgp, contr.treatment(6, base = 3))2  0.3557    1.4272   0.1032  3.445
## C(sbpgp, contr.treatment(6, base = 3))4  0.0604    1.0623   0.1183  0.510
## C(sbpgp, contr.treatment(6, base = 3))5  0.1103    1.1166   0.1450  0.761
## C(sbpgp, contr.treatment(6, base = 3))6 -0.1507    0.8601   0.2877 -0.524
##                                                p
## C(sbpgp, contr.treatment(6, base = 3))1 1.17e-07
## C(sbpgp, contr.treatment(6, base = 3))2 0.000571
## C(sbpgp, contr.treatment(6, base = 3))4 0.609778
## C(sbpgp, contr.treatment(6, base = 3))5 0.446955
## C(sbpgp, contr.treatment(6, base = 3))6 0.600324
## 
## Likelihood ratio test=35.87  on 5 df, p=1.01e-06
## n= 956, number of events= 723 
## 
## [[4]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6, 
##     base = 3)) + strata(sexe, agegp))
## 
##                                             coef exp(coef) se(coef)      z
## C(sbpgp, contr.treatment(6, base = 3))1  0.51999   1.68202  0.11625  4.473
## C(sbpgp, contr.treatment(6, base = 3))2  0.33831   1.40258  0.10369  3.263
## C(sbpgp, contr.treatment(6, base = 3))4  0.04763   1.04879  0.11704  0.407
## C(sbpgp, contr.treatment(6, base = 3))5  0.10874   1.11487  0.14579  0.746
## C(sbpgp, contr.treatment(6, base = 3))6 -0.16631   0.84679  0.28780 -0.578
##                                                p
## C(sbpgp, contr.treatment(6, base = 3))1 7.72e-06
## C(sbpgp, contr.treatment(6, base = 3))2   0.0011
## C(sbpgp, contr.treatment(6, base = 3))4   0.6840
## C(sbpgp, contr.treatment(6, base = 3))5   0.4558
## C(sbpgp, contr.treatment(6, base = 3))6   0.5634
## 
## Likelihood ratio test=28.19  on 5 df, p=3.338e-05
## n= 956, number of events= 723 
## 
## [[5]]
## Call:
## coxph(formula = Surv(survda, dead) ~ C(sbpgp, contr.treatment(6, 
##     base = 3)) + strata(sexe, agegp))
## 
##                                             coef exp(coef) se(coef)      z
## C(sbpgp, contr.treatment(6, base = 3))1  0.62074   1.86031  0.11484  5.405
## C(sbpgp, contr.treatment(6, base = 3))2  0.38188   1.46504  0.10358  3.687
## C(sbpgp, contr.treatment(6, base = 3))4  0.03305   1.03360  0.11994  0.276
## C(sbpgp, contr.treatment(6, base = 3))5  0.13979   1.15003  0.14650  0.954
## C(sbpgp, contr.treatment(6, base = 3))6 -0.04512   0.95588  0.26928 -0.168
##                                                p
## C(sbpgp, contr.treatment(6, base = 3))1 6.47e-08
## C(sbpgp, contr.treatment(6, base = 3))2 0.000227
## C(sbpgp, contr.treatment(6, base = 3))4 0.782896
## C(sbpgp, contr.treatment(6, base = 3))5 0.339992
## C(sbpgp, contr.treatment(6, base = 3))6 0.866916
## 
## Likelihood ratio test=38.51  on 5 df, p=2.985e-07
## n= 956, number of events= 723

12. Pool the results for each of the five scenarios.

r1 <- as.vector(t(exp(summary(pool(fit1))[, c(2)])))
r2 <- as.vector(t(exp(summary(pool(fit2))[, c(2)])))
r3 <- as.vector(t(exp(summary(pool(fit3))[, c(2)])))
r4 <- as.vector(t(exp(summary(pool(fit4))[, c(2)])))
r5 <- as.vector(t(exp(summary(pool(fit5))[, c(2)])))

summary(pool(fit1))

##                                      term    estimate std.error  statistic
## 1 C(sbpgp, contr.treatment(6, base = 3))1  0.51894998 0.1270346  4.0851076
## 2 C(sbpgp, contr.treatment(6, base = 3))2  0.33980528 0.1076749  3.1558438
## 3 C(sbpgp, contr.treatment(6, base = 3))4  0.07075033 0.1242191  0.5695608
## 4 C(sbpgp, contr.treatment(6, base = 3))5  0.06781262 0.1504278  0.4507986
## 5 C(sbpgp, contr.treatment(6, base = 3))6 -0.10245227 0.2836645 -0.3611741
##         df      p.value
## 1 340.6491 5.497273e-05
## 2 219.7215 1.824546e-03
## 3 165.4914 5.697479e-01
## 4 196.1717 6.526326e-01
## 5 302.7843 7.182212e-01

This code grabs the information from the tabulated pooled results that are produced by summary. In order to make sense about these numbers, and to see what exactly is extracted in the above code, laying out the numbers in a proper table may be useful.

pars <- matrix(c(r1,r2,r3,r4,r5), nrow = 5) %>% 
  t() %>% 
  round(digits = 2)
pars

##      [,1] [,2] [,3] [,4] [,5]
## [1,] 1.68 1.40 1.07 1.07 0.90
## [2,] 1.70 1.44 1.08 1.07 0.88
## [3,] 1.74 1.44 1.07 1.12 0.88
## [4,] 1.69 1.44 1.09 1.11 0.85
## [5,] 1.76 1.45 1.10 1.13 0.88

pars <- pars[, c(1, 2, 5)]
dimnames(pars) <- list(delta, c("<125", "125-144", ">200"))
pars

##     <125 125-144 >200
## 0   1.68    1.40 0.90
## -5  1.70    1.44 0.88
## -10 1.74    1.44 0.88
## -15 1.69    1.44 0.85
## -20 1.76    1.45 0.88

All in all, it seems that even big changes to the imputations (e.g. deducting 20 mmHg) has little influence on the results. This suggests that the results are stable relatively to this type of MNAR-mechanism.

13. Perform sensitivity analysis analysis on the mammalsleep dataset by adding and subtracting some amount from the imputed values for sws. Use delta <- c(8, 6, 4, 2, 0, -2, -4, -6, -8) and investigating the influence on the following regression model:

lm(sws ~ log10(bw) + odi)

Sensitivity analysis is an important tool for investigating the plausibility of the MAR assumption. We again use the \(\delta\)-adjustment technique described in Van Buuren (2012, p. 185) as an informal, simple and direct method to create imputations under nonignorable models. We do so by simply adding and substracting some amount from the imputations.

delta <- c(8, 6, 4, 2, 0, -2, -4, -6, -8)
meth<- make.method(mammalsleep)
meth["ts"]<- "~ I(sws + ps)"
pred <- make.predictorMatrix(mammalsleep)
pred[c("sws", "ps"), "ts"] <- 0
pred[, "species"] <- 0
post <- make.post(mammalsleep)
imputations <- list()
for (i in 1:length(delta)) {
  d <- delta[i]
  post["sws"] <- cmd
  cmd <- paste("imp[[j]][, i] <- imp[[j]][, i] +", d)
  imputations[[i]]  <- mice(mammalsleep, 
                            meth=meth, 
                            pred=pred, 
                            post = post, 
                            maxit = 10, 
                            seed = i * 22, 
                            print = FALSE)
}

output <- sapply(imputations, function(x) pool(with(x, lm(sws ~ log10(bw) + odi)))$pooled$estimate)
result <- cbind(delta, as.data.frame(t(output)))
colnames(result) <- c("delta", "Intercept", "log10(bw)", "odi")
result

##   delta Intercept   log10(bw)        odi
## 1     8  7.507252 -3.66795293 -0.5354851
## 2     6 13.567651  0.01713469 -1.2146387
## 3     4 13.201827 -0.20734455 -1.1732456
## 4     2 12.779407 -0.58037756 -1.1364257
## 5     0 12.364483 -0.72542116 -1.0985924
## 6    -2 11.755797 -1.10028455 -0.9627950
## 7    -4 11.529455 -1.33036140 -0.9906999
## 8    -6 10.892475 -1.63272780 -0.8554995
## 9    -8 10.396622 -1.82703458 -0.8500622

The estimates for different \(\delta\) are not close. A clear trend for the estimates for the intercept and for bw emerges. Thus, the results are not essentially the same under all specified mechanisms and the outcomes can be deemed sensitive to the assumed mechanism.

However, in this scenario, the \(\delta\) adjustment is completely unrealistic. If we look at the descriptive information for observed sws

summary(mammalsleep$sws)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   2.100   6.250   8.350   8.673  11.000  17.900      14

we find that even our smallest adjustment, i.e. \(\delta = | 2 |\), already makes up almost a quarter of the average sws. Choosing unreasonably large values may always influence your estimates. Therefore; choosing values that are reasonable given your suspicions of an assumed breach of the MAR assumption is vital.

We only used a shift parameter here. In other applications, scale or shape parameters could be more natural (see e.g. Van Buuren (2012), Ch. 3.9.1). The calculations are easily adapted to such cases.

Conclusion

We have seen that we can create multiple imputations in multivariate missing data problems that imitate deviations from MAR. The analysis used the post argument of the mice() function as a hook to alter the imputations just after they have been created by a univariate imputation function. The diagnostics shows that the trick works. The relative mortality estimates are however robust to this type of alteration.

References

Van Buuren, S. (2012), Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. ISBN 9781439868249. CRC Press, Amazon.
Boshuizen, H. C., Izaks, G. J., van Buuren, S., & Ligthart, G. J. (1998). Blood pressure and mortality in elderly people aged 85 and older: community based study. BMJ (Clinical research ed.), 316(7147), 1780–1784. doi:10.1136/bmj.316.7147.1780

- End of practical

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] patchwork_1.1.1   dplyr_1.0.9       purrr_0.3.4       ggplot2_3.3.6    
## [5] ggmice_0.0.1.9000 survival_3.3-1    mice_3.14.0      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9        lattice_0.20-45   tidyr_1.2.0       zoo_1.8-10       
##  [5] assertthat_0.2.1  digest_0.6.29     utf8_1.2.2        R6_2.5.1         
##  [9] survminer_0.4.9   backports_1.4.1   evaluate_0.15     highr_0.9        
## [13] pillar_1.7.0      rlang_1.0.3       rstudioapi_0.13   data.table_1.14.2
## [17] car_3.1-0         jquerylib_0.1.4   Matrix_1.4-1      rmarkdown_2.14   
## [21] labeling_0.4.2    splines_4.2.0     stringr_1.4.0     munsell_0.5.0    
## [25] broom_1.0.0       compiler_4.2.0    xfun_0.31         pkgconfig_2.0.3  
## [29] htmltools_0.5.2   tidyselect_1.1.2  tibble_3.1.7      gridExtra_2.3    
## [33] km.ci_0.5-6       fansi_1.0.3       crayon_1.5.1      withr_2.5.0      
## [37] ggpubr_0.4.0      grid_4.2.0        jsonlite_1.8.0    xtable_1.8-4     
## [41] gtable_0.3.0      lifecycle_1.0.1   DBI_1.1.2         magrittr_2.0.3   
## [45] KMsurv_0.1-5      scales_1.2.0      cli_3.3.0         stringi_1.7.6    
## [49] carData_3.0-5     farver_2.1.1      ggsignif_0.6.3    bslib_0.3.1      
## [53] ellipsis_0.3.2    survMisc_0.5.6    generics_0.1.3    vctrs_0.4.1      
## [57] tools_4.2.0       glue_1.6.2        abind_1.4-5       fastmap_1.1.0    
## [61] yaml_2.3.5        colorspace_2.0-3  rstatix_0.7.0     knitr_1.39       
## [65] sass_0.4.1

`mice`: An approach to sensitivity analysis

Gerko Vink and Stef van Buuren

Multiple Imputation in Practice

The Leiden data set

Conclusion

References

mice: An approach to sensitivity analysis

Gerko Vink and Stef van Buuren

Multiple Imputation in Practice

The Leiden data set

Conclusion

References

`mice`: An approach to sensitivity analysis