This is the fifth practical.

In this practical we will walk you through different ways of combining inferences based on multiply imputed data sets.

All the best,

Gerko and Stef


1. Open R and load the following packages and fix the random seed.

library(mice) # Data imputation
library(dplyr) # Data manipulation
library(magrittr) # Flexible piping in R
library(purrr) # Flexible functional programming
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document if you follow the order and code of the exercises precisely.


2. Impute the boys data properly with passive imputation for bmi with the following parameters:

  • m = 10 for 10 imputed datasets.
  • maxit = 6 to give the algorithm 6 iterations to obtain a stable solution.
  • print = FALSE to omit printing of the iteration and imputation history.

We will use this data to go through the workflow of data analysis with mids (multiply imputed data set) objects.

We start by creating the method vector and specify the passive imputation of bmi

meth <- make.method(boys)
meth["bmi"] <- "~ I(wgt / (hgt / 100)^2)"

Then we remove bmi as a predictor for hgt and wgt to avoid circularity (bmi feeding back into hgt and wgt.

pred <- make.predictorMatrix(boys)
pred[c("hgt", "wgt"), "bmi"] <- 0
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   0  1   1   1  1   1
## wgt   1   1   0   0  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

and we run the mice algorithm again with the new predictor matrix (we still ‘borrow’ the imputation methods object meth from before)

imp <-mice(boys, 
           meth = meth, 
           pred = pred, 
           print = FALSE, 
           m = 10, 
           maxit = 6)

We use the multiply imputed data set imp from now on.


Correlations


3. Calculate a correlation between all continuous variables for the imputed boys data

There are two ways in which we can calculate the correlation on the imputed data:

  • The wrong way: calculate an estimate over the average imputed dataset .

Quite often people are suggesting that using the average imputed dataset - so taking the average over the imputed data set such that any realized cell depicts the average over the corresponding data in the imputed data - would be efficient and conform Rubin’s rules. This is not true. Doing this will yield false inference.

To demonstrate this, let’s ceate the averaged data set and exclude the non-numerical columns:

ave <- imp %>%
  mice::complete("long") %>%
  group_by(.id) %>%
  summarise_all(.funs = mean) %>%
  select(-.id, -.imp, -phb, -gen, -reg)

head(ave)
## # A tibble: 6 x 6
##     age   hgt   wgt   bmi    hc    tv
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.035  50.1  3.65  14.5  33.7   1.7
## 2 0.038  53.5  3.37  11.8  35     2.1
## 3 0.057  50    3.14  12.6  35.2   2.2
## 4 0.06   54.5  4.27  14.4  36.7   2.4
## 5 0.062  57.5  5.03  15.2  37.3   1.8
## 6 0.068  55.5  4.66  15.1  37     1.9

If we now calculate Pearson’s correlation, rounded to two digits:

cor.wrong <- ave %>%
  cor() %>%
  round(digits = 2)

we obtain:

cor.wrong
##      age  hgt  wgt  bmi   hc   tv
## age 1.00 0.98 0.95 0.63 0.86 0.86
## hgt 0.98 1.00 0.94 0.60 0.91 0.81
## wgt 0.95 0.94 1.00 0.79 0.84 0.87
## bmi 0.63 0.60 0.79 1.00 0.59 0.64
## hc  0.86 0.91 0.84 0.59 1.00 0.68
## tv  0.86 0.81 0.87 0.64 0.68 1.00
  • The correct way: calculate an estimate for each imputed dataset and average over the estimates

It is best to do a Fisher transformation before pooling the correlation estimates - and a backtransformation afterwards. Therefore we define the following two functions that allow us to transform and backtransform any value:

fisher.trans <- function(x) 1/2 * log((1 + x) / (1 - x))
fisher.backtrans <- function(x) (exp(2 * x) - 1) / (exp(2 * x) + 1)

Now, to calculate the correlation on the imputed data

cor <- imp %>%
  mice::complete("all") %>%
  map(select, -phb, -gen, -reg) %>%  
  map(stats::cor) %>%
  map(fisher.trans)
cor
## $`1`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1878757 1.835096 0.7359229 1.2791152 1.1722685
## hgt 2.1878757       Inf 1.771183 0.6834421 1.5272005 1.0382911
## wgt 1.8350965 1.7711835      Inf 1.0722950 1.2159534 1.2030936
## bmi 0.7359229 0.6834421 1.072295       Inf 0.6799468 0.7208965
## hc  1.2791152 1.5272005 1.215953 0.6799468       Inf 0.7778364
## tv  1.1722685 1.0382911 1.203094 0.7208965 0.7778364       Inf
## 
## $`2`
##           age      hgt      wgt       bmi        hc        tv
## age       Inf 2.197663 1.838480 0.7403992 1.2775789 1.2064233
## hgt 2.1976634      Inf 1.773160 0.6890960 1.5288591 1.0549998
## wgt 1.8384798 1.773160      Inf 1.0786107 1.2120965 1.2161039
## bmi 0.7403992 0.689096 1.078611       Inf 0.6778265 0.7144318
## hc  1.2775789 1.528859 1.212096 0.6778265       Inf 0.7771159
## tv  1.2064233 1.055000 1.216104 0.7144318 0.7771159       Inf
## 
## $`3`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1939477 1.838495 0.7376752 1.2744827 1.2016488
## hgt 2.1939477       Inf 1.771914 0.6855465 1.5250910 1.0458457
## wgt 1.8384945 1.7719145      Inf 1.0741434 1.2130511 1.2328913
## bmi 0.7376752 0.6855465 1.074143       Inf 0.6787182 0.7394494
## hc  1.2744827 1.5250910 1.213051 0.6787182       Inf 0.7747387
## tv  1.2016488 1.0458457 1.232891 0.7394494 0.7747387       Inf
## 
## $`4`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1927616 1.837084 0.7371136 1.2823956 1.1776434
## hgt 2.1927616       Inf 1.772141 0.6851893 1.5325135 1.0344195
## wgt 1.8370835 1.7721405      Inf 1.0744032 1.2141159 1.1880482
## bmi 0.7371136 0.6851893 1.074403       Inf 0.6740508 0.7061636
## hc  1.2823956 1.5325135 1.214116 0.6740508       Inf 0.7740895
## tv  1.1776434 1.0344195 1.188048 0.7061636 0.7740895       Inf
## 
## $`5`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1941050 1.838961 0.7279296 1.2886884 1.2095691
## hgt 2.1941050       Inf 1.772902 0.6761208 1.5382402 1.0592332
## wgt 1.8389612 1.7729018      Inf 1.0599459 1.2234810 1.2017023
## bmi 0.7279296 0.6761208 1.059946       Inf 0.6759023 0.6937526
## hc  1.2886884 1.5382402 1.223481 0.6759023       Inf 0.7968961
## tv  1.2095691 1.0592332 1.201702 0.6937526 0.7968961       Inf
## 
## $`6`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1996604 1.839587 0.7330387 1.2809123 1.1885674
## hgt 2.1996604       Inf 1.774029 0.6830806 1.5276535 1.0463623
## wgt 1.8395871 1.7740286      Inf 1.0700370 1.2154048 1.2307539
## bmi 0.7330387 0.6830806 1.070037       Inf 0.6814476 0.7383829
## hc  1.2809123 1.5276535 1.215405 0.6814476       Inf 0.7822894
## tv  1.1885674 1.0463623 1.230754 0.7383829 0.7822894       Inf
## 
## $`7`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1931581 1.839744 0.7393494 1.2881746 1.1955025
## hgt 2.1931581       Inf 1.773270 0.6877363 1.5410324 1.0565378
## wgt 1.8397444 1.7732701      Inf 1.0746518 1.2269337 1.2055658
## bmi 0.7393494 0.6877363 1.074652       Inf 0.6856301 0.7009441
## hc  1.2881746 1.5410324 1.226934 0.6856301       Inf 0.7921718
## tv  1.1955025 1.0565378 1.205566 0.7009441 0.7921718       Inf
## 
## $`8`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1877780 1.832967 0.7419922 1.2763555 1.1932445
## hgt 2.1877780       Inf 1.773265 0.6914963 1.5269345 1.0392917
## wgt 1.8329670 1.7732654      Inf 1.0820834 1.2164388 1.2163345
## bmi 0.7419922 0.6914963 1.082083       Inf 0.6847693 0.7313423
## hc  1.2763555 1.5269345 1.216439 0.6847693       Inf 0.7824216
## tv  1.1932445 1.0392917 1.216334 0.7313423 0.7824216       Inf
## 
## $`9`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1941850 1.838786 0.7439296 1.2744372 1.1636449
## hgt 2.1941850       Inf 1.772736 0.6913553 1.5223461 1.0378054
## wgt 1.8387856 1.7727355      Inf 1.0829731 1.2030606 1.2029579
## bmi 0.7439296 0.6913553 1.082973       Inf 0.6728846 0.7205754
## hc  1.2744372 1.5223461 1.203061 0.6728846       Inf 0.7750864
## tv  1.1636449 1.0378054 1.202958 0.7205754 0.7750864       Inf
## 
## $`10`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1935138 1.834753 0.7428199 1.2827134 1.1648259
## hgt 2.1935138       Inf 1.774427 0.6930298 1.5376419 1.0273442
## wgt 1.8347533 1.7744269      Inf 1.0836424 1.2169079 1.1884189
## bmi 0.7428199 0.6930298 1.083642       Inf 0.6812396 0.7338424
## hc  1.2827134 1.5376419 1.216908 0.6812396       Inf 0.7823209
## tv  1.1648259 1.0273442 1.188419 0.7338424 0.7823209       Inf

The object cor is a list over the \(m\) imputations where each listed index is a correlation matrix. To calculate the average over the correlation matrices, we can add the \(m\) listed indices and divide them by \(m\):

cor.rect <- Reduce("+", cor) / length(cor) # m is equal to the length of the list
cor.rect <- fisher.backtrans(cor.rect)

If we compare the wrong estimates in cor.wrong

cor.wrong
##      age  hgt  wgt  bmi   hc   tv
## age 1.00 0.98 0.95 0.63 0.86 0.86
## hgt 0.98 1.00 0.94 0.60 0.91 0.81
## wgt 0.95 0.94 1.00 0.79 0.84 0.87
## bmi 0.63 0.60 0.79 1.00 0.59 0.64
## hc  0.86 0.91 0.84 0.59 1.00 0.68
## tv  0.86 0.81 0.87 0.64 0.68 1.00

with the correct estimates in cor.rect

round(cor.rect, digits = 2)
##      age  hgt  wgt  bmi   hc   tv
## age  NaN 0.98 0.95 0.63 0.86 0.83
## hgt 0.98  NaN 0.94 0.60 0.91 0.78
## wgt 0.95 0.94  NaN 0.79 0.84 0.84
## bmi 0.63 0.60 0.79  NaN 0.59 0.62
## hc  0.86 0.91 0.84 0.59  NaN 0.65
## tv  0.83 0.78 0.84 0.62 0.65  NaN

We see that the wrong estimates in cor.wrong have the tendency to overestimate the correlation coefficient that is correctly combined following Rubin’s rules.

The correct estimates have a diagonal of NaN’s, because the tranformation of a correlation of 1 yields Inf and the backtransformation of Inf has no representation in real number space. We know the diagonal is supposed to be 1, so we can simply correct this

diag(cor.rect) <- 1
cor.rect
##           age       hgt       wgt       bmi        hc        tv
## age 1.0000000 0.9754279 0.9505445 0.6279456 0.8566142 0.8297502
## hgt 0.9754279 1.0000000 0.9439267 0.5957993 0.9105531 0.7794678
## wgt 0.9505445 0.9439267 1.0000000 0.7914417 0.8383944 0.8362551
## bmi 0.6279456 0.5957993 0.7914417 1.0000000 0.5910261 0.6168957
## hc  0.8566142 0.9105531 0.8383944 0.5910261 1.0000000 0.6535649
## tv  0.8297502 0.7794678 0.8362551 0.6168957 0.6535649 1.0000000

Why does the average data set not serve as a good basis for analysis?

In FIMD v2, paragraph 5.1.2 Stef mentions the following:

The average workflow is faster and easier than the correct methods, since there is no need to replicate the analyses \(m\) times. In the words of Dempster and Rubin (1983), this workflow is

seductive because it can lull the user into the pleasurable state of believing that the data are complete after all.

The ensuing statistical analysis does not know which data are observed and which are missing, and treats all data values as real, which will underestimate the uncertainty of the parameters. The reported standard errors and p-values after data-averaging are generally too low. The correlations between the variables of the averaged data will be too high. For example, the correlation matrix in the average data are more extreme than the average of the \(m\) correlation matrices, which is an example of ecological fallacy. As researchers tend to like low p-values and high correlations, there is a cynical reward for the analysis of the average data. However, analysis of the average data cannot give a fair representation of the uncertainties associated with the underlying data, and hence is not recommended.


So, please stay away from averaging the imputed data sets. Instead, use the correct workflow of analyzing the imputed sets seperately and combining the inference afterwards.


Linear models


4. Fit the following linear model on the imputed data:

  • lm(age ~ wgt + hgt)
fit1.lm <- imp %>%
  with(lm(age ~ wgt + hgt))

est1.lm <- pool(fit1.lm)
est1.lm
## Class: mipo    m = 10 
##               estimate         ubar            b            t dfcom
## (Intercept) -7.4794621 5.918025e-02 2.313912e-04 5.943478e-02   745
## wgt          0.0724463 3.489648e-05 1.840774e-07 3.509897e-05   745
## hgt          0.1063647 1.093685e-05 5.570838e-08 1.099813e-05   745
##                   df         riv      lambda         fmi
## (Intercept) 738.7124 0.004300933 0.004282514 0.006967430
## wgt         736.7091 0.005802451 0.005768977 0.008457145
## hgt         736.9898 0.005603002 0.005571784 0.008259464
summary(est1.lm)
##               estimate   std.error statistic       df p.value
## (Intercept) -7.4794621 0.243792487 -30.67962 738.7124       0
## wgt          0.0724463 0.005924438  12.22838 736.7091       0
## hgt          0.1063647 0.003316343  32.07288 736.9898       0

5. Now expand the linear model from (4) with a squared term for hgt:

  • lm(age ~ wgt + hgt + I(hgt^2))
fit2.lm <- imp %>%
  with(lm(age ~ wgt + hgt + I(hgt^2)))

est2.lm <- pool(fit2.lm)
est2.lm
## Class: mipo    m = 10 
##                  estimate         ubar            b            t dfcom
## (Intercept) -4.0716870896 2.492788e-01 2.287614e-03 2.517952e-01   744
## wgt          0.0317800312 6.003202e-05 5.758868e-07 6.066550e-05   744
## hgt          0.0392466799 8.555732e-05 9.736979e-07 8.662838e-05   744
## I(hgt^2)     0.0003566917 2.130051e-09 2.666988e-11 2.159388e-09   744
##                   df        riv      lambda        fmi
## (Intercept) 728.6527 0.01009462 0.009993736 0.01269996
## wgt         727.7857 0.01055229 0.010442103 0.01315031
## hgt         723.8242 0.01251871 0.012363934 0.01508161
## I(hgt^2)    721.1033 0.01377285 0.013585731 0.01631024
summary(est2.lm)
##                  estimate    std.error statistic       df      p.value
## (Intercept) -4.0716870896 5.017920e-01 -8.114293 728.6527 2.220446e-15
## wgt          0.0317800312 7.788806e-03  4.080219 727.7857 4.996757e-05
## hgt          0.0392466799 9.307437e-03  4.216701 723.8242 2.793234e-05
## I(hgt^2)     0.0003566917 4.646922e-05  7.675871 721.1033 5.351275e-14

Model comparisons


6. Compare the models from (4) and (5) to see which model would yield the best fit:

We have three choices for evaluation:

D1(fit2.lm, fit1.lm) # multivariate Wald test
##    test statistic df1      df2 df.com      p.value        riv
##  1 ~~ 2  58.91899   1 710.2912    744 5.440093e-14 0.01377285

The \(D_1\) statistic requires a variance covariance matrix for the estimates.

D2(fit2.lm, fit1.lm) # combining test statistics
##    test statistic df1      df2 df.com      p.value        riv
##  1 ~~ 2   58.6928   1 29888.44     NA 1.898481e-14 0.01765924

The \(D_2\) requires only the test statistics (e.g. p-values or \(X^2\) values) and hence is more flexible to apply than the \(D_1\) statistic: But this comes at a cost as the resulting inference is less informed by the data than under the \(D_1\) statistic.

D3(fit2.lm, fit1.lm) # likelihood ratio test
##    test statistic df1      df2 df.com p.value        riv
##  1 ~~ 2  103.8451   1 4740.462    Inf       0 0.02611911

For large sample size, \(D_3\) is equivalent to \(D_1\), however, \(D_3\) does not require the covariances of the complete data estimates. It is the preferred method for testing random effects, and connects to global fit statistics in structural equation models. The likelihood ratio test does not require normality. For large riv (i.e. values > 10), the \(D_3\) statistics must be taken with a grain of salt. In general, given what we know today, the \(D_1\) statistic may be slightly more efficient than \(D_3\) for small samples (i.e. \(< 200\) cases); for larger samples (i.e. \(\geq 200\) cases) the \(D_1\) and \(D_3\) appear equally good and a choice between them is mostly a matter of convenience –> see also paragraph 5.3.4 in FIMD v2 for a comparison on when to use \(D_1\), \(D_2\) and/or \(D_3\).


Stepwise modeling


7. Fit a stepwise linear model to predict hgt seperately to each of the imputed data sets.

We can use the step() function in R to select formula-based models. We start by specifying the scope of the evaluations:

scope <- list(upper = ~ age + wgt + hc + gen + phb + tv + reg,
              lower = ~ 1)

The scope specifies the upper bound of the model (all variable to run the selection on) and lower bound of the mode, where a 1 indicates an intercept only model.

We can then specify the expressions to be evaluated:

expr <- expression(f1 <- lm(hgt ~ 1),
                   f2 <- step(f1, 
                              scope = scope, 
                              direction = "forward",
                              trace = 0
                              ))

where f1 is the linear model to be evaluated and f2 the step() function that evaluates the f1 function. Finally, we apply the with() function to evaluate the expression expr on object imp:

fit <- with(imp, expr)

The fit object contains the model evaluations To calculate the times each variable was selected, we can run:

formulas <- lapply(fit$analyses, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
table(votes)
## votes
## age gen  hc phb reg  tv wgt 
##  10   8  10  10   1   7  10

We see that reg is only used in 1 models based on the 10 imputed datasets. tv is used in 7 of the models and gen is used in 8 of the 10 completed-data models. wgt, hc, age and phb are used in all models.

To determine if gen should be a part of the final model, we may run a multivariate Wald test:

fit.gen <- with(imp, lm(hgt ~ age + hc + phb + wgt + gen))
fit.nogen <- with(imp, lm(hgt ~ age + hc + phb + wgt))
D1(fit.gen, fit.nogen)
##    test statistic df1      df2 df.com   p.value       riv
##  1 ~~ 2  2.320624   4 163.3654    735 0.0590531 0.6166477

With a p-value of .059 we could conclude that gen is not strictly needed in this model. We might also investigate the BIC on the seperate imputed sets and compare those across (not within) models. We draw the same conclusion from this evaluation - the BIC is lower for the model without gen - although not by much. But then again, the p-value indicated the same trend.

BIC.gen <- fit.gen$analyses %>%
  sapply(BIC) 

BIC.nogen <- fit.nogen$analyses %>%
  sapply(BIC)

To count the model evaluations in favor of BIC.nogen –> better fit means lower BIC:

BIC.gen
##  [1] 5101.833 5082.743 5090.647 5072.039 5125.748 5094.248 5067.708
##  [8] 5079.809 5089.375 5077.005
BIC.nogen
##  [1] 5091.554 5064.707 5081.270 5070.294 5105.094 5083.308 5059.632
##  [8] 5085.753 5075.260 5075.283
sum(BIC.gen > BIC.nogen)
## [1] 9

Please not that we can compare the BIC only over the models, not over the imputed data sets. The realized imputations differ for each set, but for each imputed set, the model comparison is based on the same realization. The sum(BIC.gen > BIC.nogen) compares only the BIC’s against its counterpart for the same imputed data set. The resulting outcome can be considered as a majority vote: in our case 9 out of 10 model comparisons are in favor of the model without gen as a predictor.


Conditional means


8. Calculate the average mean for bmi for every region reg over the imputed data.

To study the means for bmi conditionally on reg to get a picture of what the differences are:

imp %>%
  mice::complete("long") %>%
  select(reg, bmi) %>%
  group_by(reg) %>%
  summarise_all(.funs = mean)
## # A tibble: 5 x 2
##   reg     bmi
##   <fct> <dbl>
## 1 north  19.3
## 2 east   17.9
## 3 west   17.9
## 4 south  17.7
## 5 city   18.2

To also obtain information about the standard error of the mean, we could extend the summarise_all() evaluation wit a custom standard error function:

se <- function(x){
  sd(x) / sqrt(length(x))
}

and then calculate the summary again

imp %>%
  mice::complete("long") %>%
  select(reg, bmi) %>%
  group_by(reg) %>%
  summarise_all(list(~ mean(.), ~se(.)))
## # A tibble: 5 x 3
##   reg    mean     se
##   <fct> <dbl>  <dbl>
## 1 north  19.3 0.120 
## 2 east   17.9 0.0742
## 3 west   17.9 0.0599
## 4 south  17.7 0.0699
## 5 city   18.2 0.0962

Mean differences: ANOVA


9. Test whether the means differ.

This is best done with an ANOVA. To do this correctly, we can apply the following workflow. First, we fit an intercept only model

fit.empty <- imp %>%
  mice::complete("all") %>%
  map(lm, formula = bmi ~ 1)

and then we fit the model with reg as a predictor

fit.reg <- imp %>%
  mice::complete("all") %>%
  map(lm, formula = bmi ~ 1 + reg)

We can calculate the seperate ANOVA’s from each fitted model:

aov.empty <- lapply(with(imp, lm(age ~ 1))$analyses, aov)
aov.reg <- lapply(with(imp, lm(age ~ 1 + reg))$analyses, aov)

And look at the summaries:

lapply(aov.empty, summary)
## [[1]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[2]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[3]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[4]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[5]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[6]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[7]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[8]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[9]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[10]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53

The summary for the aov.empty object has no p-values. This is expected as we are fitting an empty model (without any predictors) and hence have no model Mean Squares (MS) to calculate and test the ratio

\[F = \frac{\text{Model MS}}{\text{Residual MS}}\]

We do have those components for the model with reg included as predictor:

lapply(aov.reg, summary)
## [[1]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.36   4.049 0.00296 **
## Residuals   743  34746   46.76                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[2]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.36   4.049 0.00296 **
## Residuals   743  34746   46.76                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[3]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.21   4.046 0.00298 **
## Residuals   743  34747   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[4]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    750  187.61   4.011 0.00316 **
## Residuals   743  34753   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[5]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    756  188.94    4.04 0.00301 **
## Residuals   743  34748   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[6]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    739  184.68   3.947 0.00354 **
## Residuals   743  34765   46.79                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[7]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    762  190.60   4.076 0.00282 **
## Residuals   743  34741   46.76                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[8]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    695  173.84   3.711 0.00533 **
## Residuals   743  34808   46.85                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[9]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.36   4.049 0.00296 **
## Residuals   743  34746   46.76                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[10]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    744  185.99   3.976 0.00337 **
## Residuals   743  34759   46.78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We find that each of the seperate ANOVA’s indicates significance, meaning that there is an overall effect for reg on bmi in each imputed data set.

To obtain an overall estimate for the ANOVA we can simply compare the empty model to the model with reg included:

D1(fit.reg, fit.empty)
##    test statistic df1      df2 df.com     p.value        riv
##  1 ~~ 2  4.604099   4 738.4286    743 0.001120199 0.01167765

And find that indeed the overall (i.e. pooled) effect over the imputations is also significant, which is not surprising as the F-values for the seperate tests show little variation.


- End of practical