This is the fifth practical.

In this practical we will walk you through different ways of combining inferences based on multiply imputed data sets.

All the best,

1. Open R and load the following packages and fix the random seed.

library(mice) # Data imputation
library(dplyr) # Data manipulation
library(magrittr) # Flexible piping in R
library(purrr) # Flexible functional programming
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document if you follow the order and code of the exercises precisely.

2. Impute the boys data properly with passive imputation for bmi with the following parameters:

m = 10 for 10 imputed datasets.
maxit = 6 to give the algorithm 6 iterations to obtain a stable solution.
print = FALSE to omit printing of the iteration and imputation history.

We will use this data to go through the workflow of data analysis with mids (multiply imputed data set) objects.

We start by creating the method vector and specify the passive imputation of bmi

meth <- make.method(boys)
meth["bmi"] <- "~ I(wgt / (hgt / 100)^2)"

Then we remove bmi as a predictor for hgt and wgt to avoid circularity (bmi feeding back into hgt and wgt.

pred <- make.predictorMatrix(boys)
pred[c("hgt", "wgt"), "bmi"] <- 0
pred

##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   0  1   1   1  1   1
## wgt   1   1   0   0  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

and we run the mice algorithm again with the new predictor matrix (we still ‘borrow’ the imputation methods object meth from before)

imp <-mice(boys, 
           meth = meth, 
           pred = pred, 
           print = FALSE, 
           m = 10, 
           maxit = 6)

We use the multiply imputed data set imp from now on.

Correlations

3. Calculate a correlation between all continuous variables for the imputed boys data

There are two ways in which we can calculate the correlation on the imputed data:

The wrong way: calculate an estimate over the average imputed dataset .

Quite often people are suggesting that using the average imputed dataset - so taking the average over the imputed data set such that any realized cell depicts the average over the corresponding data in the imputed data - would be efficient and conform Rubin’s rules. This is not true. Doing this will yield false inference.

To demonstrate this, let’s create the “wrong” averaged data set and exclude the non-numerical columns:

ave <- imp %>%
  mice::complete("long") %>%
  group_by(.id) %>%
  summarise_all(.funs = mean) %>%
  select(-.id, -.imp, -phb, -gen, -reg)

head(ave)

## # A tibble: 6 × 6
##     age   hgt   wgt   bmi    hc    tv
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.035  50.1  3.65  14.5  33.7   2.5
## 2 0.038  53.5  3.37  11.8  35     3.5
## 3 0.057  50    3.14  12.6  35.2   2.8
## 4 0.06   54.5  4.27  14.4  36.7   2.5
## 5 0.062  57.5  5.03  15.2  37.3   2  
## 6 0.068  55.5  4.66  15.1  37     2.1

If we now calculate Pearson’s correlation, rounded to two digits:

cor.wrong <- ave %>%
  cor() %>%
  round(digits = 2)

we obtain:

cor.wrong

##      age  hgt  wgt  bmi   hc   tv
## age 1.00 0.98 0.95 0.63 0.86 0.86
## hgt 0.98 1.00 0.94 0.60 0.91 0.81
## wgt 0.95 0.94 1.00 0.79 0.84 0.87
## bmi 0.63 0.60 0.79 1.00 0.60 0.64
## hc  0.86 0.91 0.84 0.60 1.00 0.67
## tv  0.86 0.81 0.87 0.64 0.67 1.00

The correct way: calculate an estimate for each imputed dataset and average over the estimates

It is best to do a Fisher transformation before pooling the correlation estimates - and a backtransformation afterwards. Therefore we define the following two functions that allow us to transform and backtransform any value:

fisher.trans <- function(x) 1/2 * log((1 + x) / (1 - x))
fisher.backtrans <- function(x) (exp(2 * x) - 1) / (exp(2 * x) + 1)

Now, to calculate the correlation on the imputed data

cor <- imp %>%
  mice::complete("all") %>%
  map(select, -phb, -gen, -reg) %>%  
  map(stats::cor) %>%
  map(fisher.trans)
cor

## $`1`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1971683 1.837486 0.7306936 1.2875134 1.1764013
## hgt 2.1971683       Inf 1.771810 0.6796283 1.5304395 1.0150829
## wgt 1.8374860 1.7718099      Inf 1.0661101 1.2173486 1.1632394
## bmi 0.7306936 0.6796283 1.066110       Inf 0.6755864 0.6965303
## hc  1.2875134 1.5304395 1.217349 0.6755864       Inf 0.7662706
## tv  1.1764013 1.0150829 1.163239 0.6965303 0.7662706       Inf
## 
## $`2`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1928290 1.839154 0.7396715 1.2805717 1.1540150
## hgt 2.1928290       Inf 1.772569 0.6866849 1.5335605 1.0247420
## wgt 1.8391536 1.7725691      Inf 1.0777861 1.2168596 1.1924681
## bmi 0.7396715 0.6866849 1.077786       Inf 0.6757144 0.7175002
## hc  1.2805717 1.5335605 1.216860 0.6757144       Inf 0.7716682
## tv  1.1540150 1.0247420 1.192468 0.7175002 0.7716682       Inf
## 
## $`3`
##          age       hgt      wgt       bmi        hc        tv
## age      Inf 2.1980513 1.839317 0.7384530 1.2750962 1.1623767
## hgt 2.198051       Inf 1.772417 0.6877455 1.5248987 1.0039437
## wgt 1.839317 1.7724172      Inf 1.0774889 1.2124082 1.1611924
## bmi 0.738453 0.6877455 1.077489       Inf 0.6812475 0.7128481
## hc  1.275096 1.5248987 1.212408 0.6812475       Inf 0.7624198
## tv  1.162377 1.0039437 1.161192 0.7128481 0.7624198       Inf
## 
## $`4`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1961017 1.838918 0.7408736 1.2897920 1.1832366
## hgt 2.1961017       Inf 1.774843 0.6907368 1.5444453 1.0383982
## wgt 1.8389176 1.7748429      Inf 1.0800728 1.2256015 1.2125146
## bmi 0.7408736 0.6907368 1.080073       Inf 0.6827374 0.7359459
## hc  1.2897920 1.5444453 1.225601 0.6827374       Inf 0.7837830
## tv  1.1832366 1.0383982 1.212515 0.7359459 0.7837830       Inf
## 
## $`5`
##           age      hgt      wgt       bmi        hc        tv
## age       Inf 2.195998 1.839240 0.7359324 1.2753551 1.1843788
## hgt 2.1959978      Inf 1.772784 0.6840920 1.5270459 1.0261064
## wgt 1.8392403 1.772784      Inf 1.0709307 1.2095539 1.1910395
## bmi 0.7359324 0.684092 1.070931       Inf 0.6742064 0.7114433
## hc  1.2753551 1.527046 1.209554 0.6742064       Inf 0.7587066
## tv  1.1843788 1.026106 1.191039 0.7114433 0.7587066       Inf
## 
## $`6`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1960817 1.839802 0.7380297 1.2683845 1.1098175
## hgt 2.1960817       Inf 1.773711 0.6872310 1.5185628 0.9710611
## wgt 1.8398023 1.7737114      Inf 1.0753483 1.2056839 1.1312176
## bmi 0.7380297 0.6872310 1.075348       Inf 0.6778270 0.6868663
## hc  1.2683845 1.5185628 1.205684 0.6778270       Inf 0.7000833
## tv  1.1098175 0.9710611 1.131218 0.6868663 0.7000833       Inf
## 
## $`7`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1962046 1.835994 0.7315759 1.2772518 1.1987818
## hgt 2.1962046       Inf 1.772959 0.6825245 1.5256606 1.0441973
## wgt 1.8359941 1.7729588      Inf 1.0685706 1.2150211 1.2020363
## bmi 0.7315759 0.6825245 1.068571       Inf 0.6805006 0.7139035
## hc  1.2772518 1.5256606 1.215021 0.6805006       Inf 0.7789061
## tv  1.1987818 1.0441973 1.202036 0.7139035 0.7789061       Inf
## 
## $`8`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1929580 1.840578 0.7439899 1.2888151 1.2043697
## hgt 2.1929580       Inf 1.773407 0.6923113 1.5404146 1.0468795
## wgt 1.8405779 1.7734068      Inf 1.0811815 1.2216658 1.2005007
## bmi 0.7439899 0.6923113 1.081182       Inf 0.6844591 0.7123828
## hc  1.2888151 1.5404146 1.221666 0.6844591       Inf 0.7906234
## tv  1.2043697 1.0468795 1.200501 0.7123828 0.7906234       Inf
## 
## $`9`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1955320 1.838406 0.7384463 1.2842531 1.1620413
## hgt 2.1955320       Inf 1.773995 0.6878719 1.5270096 1.0147341
## wgt 1.8384062 1.7739947      Inf 1.0758542 1.2183190 1.1719916
## bmi 0.7384463 0.6878719 1.075854       Inf 0.6857832 0.7065790
## hc  1.2842531 1.5270096 1.218319 0.6857832       Inf 0.7578655
## tv  1.1620413 1.0147341 1.171992 0.7065790 0.7578655       Inf
## 
## $`10`
##           age      hgt      wgt       bmi        hc        tv
## age       Inf 2.194992 1.840380 0.7412752 1.2769159 1.1855404
## hgt 2.1949924      Inf 1.773964 0.6897810 1.5248482 1.0468461
## wgt 1.8403795 1.773964      Inf 1.0783437 1.2142874 1.2146691
## bmi 0.7412752 0.689781 1.078344       Inf 0.6849574 0.7124170
## hc  1.2769159 1.524848 1.214287 0.6849574       Inf 0.7733416
## tv  1.1855404 1.046846 1.214669 0.7124170 0.7733416       Inf

The object cor is a list over the \(m\) imputations where each listed index is a correlation matrix. To calculate the average over the correlation matrices, we can add the \(m\) listed indices and divide them by \(m\):

cor.rect <- Reduce("+", cor) / length(cor) # m is equal to the length of the list
cor.rect <- fisher.backtrans(cor.rect)

If we compare the wrong estimates in cor.wrong

cor.wrong

##      age  hgt  wgt  bmi   hc   tv
## age 1.00 0.98 0.95 0.63 0.86 0.86
## hgt 0.98 1.00 0.94 0.60 0.91 0.81
## wgt 0.95 0.94 1.00 0.79 0.84 0.87
## bmi 0.63 0.60 0.79 1.00 0.60 0.64
## hc  0.86 0.91 0.84 0.60 1.00 0.67
## tv  0.86 0.81 0.87 0.64 0.67 1.00

with the correct estimates in cor.rect

round(cor.rect, digits = 2)

##      age  hgt  wgt  bmi   hc   tv
## age  NaN 0.98 0.95 0.63 0.86 0.82
## hgt 0.98  NaN 0.94 0.60 0.91 0.77
## wgt 0.95 0.94  NaN 0.79 0.84 0.83
## bmi 0.63 0.60 0.79  NaN 0.59 0.61
## hc  0.86 0.91 0.84 0.59  NaN 0.64
## tv  0.82 0.77 0.83 0.61 0.64  NaN

We see that the wrong estimates in cor.wrong have the tendency to overestimate the correlation coefficient that is correctly combined following Rubin’s rules.

The correct estimates have a diagonal of NaN’s, because the tranformation of a correlation of 1 yields Inf and the backtransformation of Inf has no representation in real number space. We know the diagonal is supposed to be 1, so we can simply correct this

diag(cor.rect) <- 1
cor.rect

##           age       hgt       wgt       bmi        hc        tv
## age 1.0000000 0.9755309 0.9506921 0.6278711 0.8565901 0.8249429
## hgt 0.9755309 1.0000000 0.9439641 0.5959615 0.9103713 0.7711664
## wgt 0.9506921 0.9439641 1.0000000 0.7914006 0.8383737 0.8287360
## bmi 0.6278711 0.5959615 0.7914006 1.0000000 0.5917157 0.6110790
## hc  0.8565901 0.9103713 0.8383737 0.5917157 1.0000000 0.6436419
## tv  0.8249429 0.7711664 0.8287360 0.6110790 0.6436419 1.0000000

Why does the average data set not serve as a good basis for analysis?

In FIMD v2, paragraph 5.1.2 Stef mentions the following:

The average workflow is faster and easier than the correct methods, since there is no need to replicate the analyses \(m\) times. In the words of Dempster and Rubin (1983), this workflow is

seductive because it can lull the user into the pleasurable state of believing that the data are complete after all.

The ensuing statistical analysis does not know which data are observed and which are missing, and treats all data values as real, which will underestimate the uncertainty of the parameters. The reported standard errors and p-values after data-averaging are generally too low. The correlations between the variables of the averaged data will be too high. For example, the correlation matrix in the average data are more extreme than the average of the \(m\) correlation matrices, which is an example of ecological fallacy. As researchers tend to like low p-values and high correlations, there is a cynical reward for the analysis of the average data. However, analysis of the average data cannot give a fair representation of the uncertainties associated with the underlying data, and hence is not recommended.

So, please stay away from averaging the imputed data sets. Instead, use the correct workflow of analyzing the imputed sets seperately and combining the inference afterwards.

Linear models

4. Fit the following linear model on the imputed data:

lm(age ~ wgt + hgt)

fit1.lm <- imp %>%
  with(lm(age ~ wgt + hgt))

est1.lm <- pool(fit1.lm)
est1.lm

## Class: mipo    m = 10 
##          term  m    estimate         ubar            b            t dfcom
## 1 (Intercept) 10 -7.47606562 5.886021e-02 7.345696e-05 5.894102e-02   745
## 2         wgt 10  0.07250458 3.471186e-05 1.548528e-07 3.488220e-05   745
## 3         hgt 10  0.10631248 1.088103e-05 3.591635e-08 1.092054e-05   745
##         df         riv      lambda         fmi
## 1 741.8745 0.001372789 0.001370907 0.004052242
## 2 737.9341 0.004907203 0.004883240 0.007569354
## 3 739.5238 0.003630905 0.003617769 0.006301541

summary(est1.lm)

##          term    estimate   std.error statistic       df p.value
## 1 (Intercept) -7.47606562 0.242777710 -30.79387 741.8745       0
## 2         wgt  0.07250458 0.005906115  12.27619 737.9341       0
## 3         hgt  0.10631248 0.003304623  32.17083 739.5238       0

5. Now expand the linear model from (4) with a squared term for hgt:

lm(age ~ wgt + hgt + I(hgt^2))

fit2.lm <- imp %>%
  with(lm(age ~ wgt + hgt + I(hgt^2)))

est2.lm <- pool(fit2.lm)
est2.lm

## Class: mipo    m = 10 
##          term  m      estimate         ubar            b            t dfcom
## 1 (Intercept) 10 -4.0568412458 2.479221e-01 4.443015e-04 2.484108e-01   744
## 2         wgt 10  0.0317693024 5.959819e-05 1.056828e-07 5.971444e-05   744
## 3         hgt 10  0.0389756596 8.508940e-05 2.479060e-07 8.536210e-05   744
## 4    I(hgt^2) 10  0.0003577151 2.116907e-09 4.546843e-12 2.121909e-09   744
##         df         riv      lambda         fmi
## 1 740.3124 0.001971311 0.001967433 0.004652798
## 2 740.3326 0.001950580 0.001946783 0.004632131
## 3 739.0178 0.003204824 0.003194586 0.005881329
## 4 739.9209 0.002362658 0.002357089 0.005042820

summary(est2.lm)

##          term      estimate    std.error statistic       df      p.value
## 1 (Intercept) -4.0568412458 4.984083e-01 -8.139594 740.3124 1.776357e-15
## 2         wgt  0.0317693024 7.727512e-03  4.111194 740.3326 4.376700e-05
## 3         hgt  0.0389756596 9.239161e-03  4.218528 739.0178 2.764546e-05
## 4    I(hgt^2)  0.0003577151 4.606418e-05  7.765580 739.9209 2.708944e-14

Model comparisons

6. Compare the models from (4) and (5) to see which model would yield the best fit:

We have three choices for evaluation:

The \(D_1\) multivariate Wald test

D1(fit2.lm, fit1.lm) # multivariate Wald test

##    test statistic df1      df2 dfcom     p.value         riv
##  1 ~~ 2  60.30423   1 740.9791   744 2.70652e-14 0.002362658

The \(D_1\) statistic requires a variance covariance matrix for the estimates.

The \(D_2\) Combining test statistic

D2(fit2.lm, fit1.lm) # combining test statistics

##    test statistic df1      df2 dfcom      p.value         riv
##  1 ~~ 2  60.24409   1 800930.6    NA 8.389226e-15 0.003363428

The \(D_2\) requires only the test statistics (e.g. p-values or \(X^2\) values) and hence is more flexible to apply than the \(D_1\) statistic: But this comes at a cost as the resulting inference is less informed by the data than under the \(D_1\) statistic.

The \(D_3\) Likelihood ratio test

D3(fit2.lm, fit1.lm) # likelihood ratio test

##    test statistic df1      df2 dfcom      p.value         riv
##  1 ~~ 2   58.2858   1 494231.8   744 2.264855e-14 0.002481764

For large sample size, \(D_3\) is equivalent to \(D_1\), however, \(D_3\) does not require the covariances of the complete data estimates. It is the preferred method for testing random effects, and connects to global fit statistics in structural equation models. The likelihood ratio test does not require normality. For large riv (i.e. values > 10), the \(D_3\) statistics must be taken with a grain of salt. In general, given what we know today, the \(D_1\) statistic may be slightly more efficient than \(D_3\) for small samples (i.e. \(< 200\) cases); for larger samples (i.e. \(\geq 200\) cases) the \(D_1\) and \(D_3\) appear equally good and a choice between them is mostly a matter of convenience –> see also paragraph 5.3.4 in FIMD v2 for a comparison on when to use \(D_1\), \(D_2\) and/or \(D_3\).

Stepwise modeling

7. Fit a stepwise linear model to predict hgt seperately to each of the imputed data sets.

We can use the step() function in R to select formula-based models. We start by specifying the scope of the evaluations:

scope <- list(upper = ~ age + wgt + hc + gen + phb + tv + reg,
              lower = ~ 1)

The scope specifies the upper bound of the model (all variable to run the selection on) and lower bound of the mode, where a 1 indicates an intercept only model.

We can then specify the expressions to be evaluated:

expr <- expression(f1 <- lm(hgt ~ 1),
                   f2 <- step(f1, 
                              scope = scope, 
                              direction = "forward",
                              trace = 0
                              ))

where f1 is the linear model to be evaluated and f2 the step() function that evaluates the f1 function. Finally, we apply the with() function to evaluate the expression expr on object imp:

fit <- with(imp, expr)

The fit object contains the model evaluations To calculate the times each variable was selected, we can run:

formulas <- lapply(fit$analyses, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
table(votes)

## votes
## age gen  hc phb reg  tv wgt 
##  10   2  10  10   1   7  10

We see that reg is only used in 1 models based on the 10 imputed datasets. tv is used in 7 of the models and gen is used in 2 of the 10 completed-data models. wgt, hc, age and phb are used in all models.

To determine if tv should be a part of the final model, we may run a multivariate Wald test:

fit.tv <- with(imp, lm(hgt ~ age + hc + phb + wgt + tv))
fit.notv <- with(imp, lm(hgt ~ age + hc + phb + wgt))
D1(fit.tv, fit.notv)

##    test statistic df1      df2 dfcom   p.value       riv
##  1 ~~ 2  1.894861   1 22.99498   738 0.1819114 0.7679935

With a p-value of .45 we could conclude that tv is not strictly needed in this model. We might also investigate the BIC on the seperate imputed sets and compare those across (not within) models. We draw the same conclusion from this evaluation - the BIC is lower for the model without tv - although not by much. But then again, the p-value indicated the same trend.

BIC.tv <- fit.tv$analyses %>%
  sapply(BIC) 

BIC.notv <- fit.notv$analyses %>%
  sapply(BIC)

To count the model evaluations in favor of BIC.notv –> better fit means lower BIC:

BIC.tv

##  [1] 5067.673 5069.692 5055.736 5086.804 5078.182 5076.572 5066.499 5078.690
##  [9] 5117.463 5107.320

BIC.notv

##  [1] 5065.019 5063.648 5058.033 5082.689 5079.410 5070.772 5065.247 5080.096
##  [9] 5113.299 5101.303

sum(BIC.tv > BIC.notv)

## [1] 7

Please not that we can compare the BIC only over the models, not over the imputed data sets. The realized imputations differ for each set, but for each imputed set, the model comparison is based on the same realization. The sum(BIC.tv > BIC.notv) compares only the BIC’s against its counterpart for the same imputed data set. The resulting outcome can be considered as a majority vote: in our case 7 out of 10 model comparisons are in favor of the model without tv as a predictor.

Conditional means

8. Calculate the average mean for bmi for every region reg over the imputed data.

To study the means for bmi conditionally on reg to get a picture of what the differences are:

imp %>%
  mice::complete("long") %>%
  select(reg, bmi) %>%
  group_by(reg) %>%
  summarise_all(.funs = mean)

## # A tibble: 5 × 2
##   reg     bmi
##   <fct> <dbl>
## 1 north  19.3
## 2 east   17.8
## 3 west   17.9
## 4 south  17.7
## 5 city   18.2

To also obtain information about the standard error of the mean, we could extend the summarise_all() evaluation wit a custom standard error function:

se <- function(x){
  sd(x) / sqrt(length(x))
}

and then calculate the summary again

imp %>%
  mice::complete("long") %>%
  select(reg, bmi) %>%
  group_by(reg) %>%
  summarise_all(list(~ mean(.), ~se(.)))

## # A tibble: 5 × 3
##   reg    mean     se
##   <fct> <dbl>  <dbl>
## 1 north  19.3 0.120 
## 2 east   17.8 0.0742
## 3 west   17.9 0.0601
## 4 south  17.7 0.0697
## 5 city   18.2 0.0965

Mean differences: ANOVA

9. Test whether the means differ.

This is best done with an ANOVA. To do this correctly, we can apply the following workflow. First, we fit an intercept only model

fit.empty <- imp %>%
  mice::complete("all") %>%
  map(lm, formula = bmi ~ 1)

and then we fit the model with reg as a predictor

fit.reg <- imp %>%
  mice::complete("all") %>%
  map(lm, formula = bmi ~ 1 + reg)

We can calculate the seperate ANOVA’s from each fitted model:

aov.empty <- lapply(with(imp, lm(age ~ 1))$analyses, aov)
aov.reg <- lapply(with(imp, lm(age ~ 1 + reg))$analyses, aov)

And look at the summaries:

lapply(aov.empty, summary)

## [[1]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[2]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[3]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[4]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[5]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[6]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[7]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[8]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[9]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[10]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53

The summary for the aov.empty object has no p-values. This is expected as we are fitting an empty model (without any predictors) and hence have no model Mean Squares (MS) to calculate and test the ratio

\[F = \frac{\text{Model MS}}{\text{Residual MS}}\]

We do have those components for the model with reg included as predictor:

lapply(aov.reg, summary)

## [[1]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.23   4.046 0.00298 **
## Residuals   743  34746   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[2]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    745  186.34   3.983 0.00332 **
## Residuals   743  34758   46.78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[3]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    739  184.67   3.947 0.00354 **
## Residuals   743  34765   46.79                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[4]]
##              Df Sum Sq Mean Sq F value Pr(>F)   
## reg           4    756  188.97   4.041  0.003 **
## Residuals   743  34747   46.77                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[5]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    743  185.86   3.973 0.00338 **
## Residuals   743  34760   46.78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[6]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.23   4.046 0.00298 **
## Residuals   743  34746   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[7]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    719  179.82   3.841 0.00425 **
## Residuals   743  34784   46.82                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[8]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    739  184.68   3.947 0.00354 **
## Residuals   743  34765   46.79                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[9]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    742  185.61   3.967 0.00341 **
## Residuals   743  34761   46.78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[10]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    755  188.86   4.038 0.00302 **
## Residuals   743  34748   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We find that each of the seperate ANOVA’s indicates significance, meaning that there is an overall effect for reg on bmi in each imputed data set.

To obtain an overall estimate for the ANOVA we can simply compare the empty model to the model with reg included:

D1(fit.reg, fit.empty)

##    test statistic df1      df2 dfcom    p.value         riv
##  1 ~~ 2  4.604686   4 739.1867   743 0.00111894 0.009690917

And find that indeed the overall (i.e. pooled) effect over the imputations is also significant, which is not surprising as the F-values for the seperate tests show little variation.

- End of practical

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] purrr_0.3.4    magrittr_2.0.3 dplyr_1.0.9    mice_3.14.0   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.2 xfun_0.31        bslib_0.3.1      splines_4.2.0   
##  [5] lattice_0.20-45  vctrs_0.4.1      generics_0.1.3   htmltools_0.5.2 
##  [9] yaml_2.3.5       pan_1.6          utf8_1.2.2       survival_3.3-1  
## [13] rlang_1.0.3      jomo_2.7-3       jquerylib_0.1.4  pillar_1.7.0    
## [17] nloptr_2.0.3     glue_1.6.2       withr_2.5.0      DBI_1.1.2       
## [21] lifecycle_1.0.1  stringr_1.4.0    evaluate_0.15    knitr_1.39      
## [25] fastmap_1.1.0    fansi_1.0.3      broom_1.0.0      Rcpp_1.0.9      
## [29] backports_1.4.1  jsonlite_1.8.0   lme4_1.1-29      digest_0.6.29   
## [33] stringi_1.7.6    grid_4.2.0       cli_3.3.0        tools_4.2.0     
## [37] sass_0.4.1       tibble_3.1.7     crayon_1.5.1     tidyr_1.2.0     
## [41] pkgconfig_2.0.3  ellipsis_0.3.2   MASS_7.3-56      Matrix_1.4-1    
## [45] assertthat_0.2.1 minqa_1.2.4      rmarkdown_2.14   rstudioapi_0.13 
## [49] mitml_0.4-3      R6_2.5.1         boot_1.3-28      nnet_7.3-17     
## [53] nlme_3.1-157     compiler_4.2.0

`mice`: combining inferences

Gerko Vink and Stef van Buuren

Multiple Imputation in Practice

Correlations

Why does the average data set not serve as a good basis for analysis?

Linear models

Model comparisons

Stepwise modeling

Conditional means

Mean differences: ANOVA

mice: combining inferences

Gerko Vink and Stef van Buuren

Multiple Imputation in Practice

Correlations

Why does the average data set not serve as a good basis for analysis?

Linear models

Model comparisons

Stepwise modeling

Conditional means

Mean differences: ANOVA

`mice`: combining inferences