`futuremice`

For big datasets or high number of imputations, performing multiple imputation with function `mice`

from package `mice`

(Van Buuren & Groothuis-Oudshoorn, 2011) might take a long time. As a solution, wrapper function `futuremice`

was created to enable the imputation procedure to be run in parallel. This is done by dividing the imputations over multiple cores (or CPUs), thus potentially speeding up the process. The function `futuremice`

is a sequel to `parlMICE`

(Schouten & Vink, 2017), developed to improve user-friendliness.

This vignette demonstrates two applications of the `futuremice`

function. The first application shows the tradeoff between time and increasing number of imputations (\(m\)) for a small dataset; the second application does the same, but for a relatively large dataset. We also discuss `futuremice`

â€™s arguments.

The function `futuremice`

depends on packages `future`

, `furrr`

and `mice`

. For more information about running functions in futures, see e.g.Â the `future`

manual or the `furrr`

manual. Function `futuremice`

found its inspiration from Maxâ€™s useful suggestions on parallelization of `mice`

â€™s chains on `stackoverflow`

.

We demonstrate the potential gain in computing efficiency on simulated data. To this end we sample 1,000 cases from a multivariate normal distribution with mean vector

\[ \mu = \left[\begin{array} {r} 0 \\ 0 \\ 0 \\ 0 \end{array}\right] \]

and covariance matrix

\[ \Sigma = \left[\begin{array} {rrrr} 1&0.5&0.5&0.5 \\ 0.5&1&0.5&0.5 \\ 0.5&0.5&1&0.5 \\ 0.5&0.5&0.5&1 \end{array}\right]. \]

A MCAR missingness mechanism is imposed on the data where 80 percent of the cases (i.e.Â rows) has missingness on one variable. All variables have missing values. The missingness is randomly generated with the following arguments from function `mice::ampute`

:

```
set.seed(123)
small_covmat <- diag(4)
small_covmat[small_covmat == 0] <- 0.5
small_data <- MASS::mvrnorm(1000,
mu = c(0, 0, 0, 0),
Sigma = small_covmat)
small_data_with_missings <- ampute(small_data, prop = 0.8, mech = "MCAR")$amp
head(small_data_with_missings)
```

V1 | V2 | V3 | V4 |
---|---|---|---|

-0.1667048 | 0.9165856 | 0.6389869 | NA |

-0.4548685 | 0.4313280 | NA | 0.5753627 |

-1.2432777 | -0.4162831 | -1.9552769 | NA |

-0.1366822 | NA | -0.5998099 | 0.7553689 |

-1.6633582 | -0.7137484 | 1.8412701 | 0.1269927 |

NA | -1.3018272 | -1.4972105 | -1.9058145 |

We compare the default â€˜sequentialâ€™ function `mice`

with function `futuremice`

. In both functions we use the defaults arguments for the `mice`

algorithm, although these could very easily be changed if desired by the user. To demonstrate the increased efficiency when putting more than one computing core to work, we repeat the procedure with `futuremice`

for 1, 2, 3 and 4 cores. Figure 1 shows a graphical representation of the results.

*Figure 1. Processing time for small datasets. Multiple imputations are performed with mice (conventional) and wrapper function futureMICE (1, 2, 3 and 4 cores, respectively). The dataset has 1000 cases and 4 variables with a correlation of 0.5. 80 percent of the cases has one missing value based on MCAR missingness.*

It becomes apparent that for a small to moderate number of imputations, the conventional `mice`

function is faster than the wrapper function `futuremice`

. This is the case until the number of imputations \(m = 120\). For higher \(m\), wrapper function `futuremice`

returns the imputations somewhat faster.

We replicated the above detailed simulation setup with a larger dataset of 10,000 cases and 8 variables. The mean and covariance structure follow the sampling scheme of the smaller data set. We show the results of this simulation in Figure 2.

V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 |
---|---|---|---|---|---|---|---|

0.3177437 | NA | 0.3578290 | -0.7861403 | 0.0857024 | 0.2905915 | -0.1159348 | 0.3464402 |

-0.4895928 | 0.7905551 | 0.9676060 | NA | 0.3915238 | 1.3301799 | 0.5672698 | 0.1748194 |

NA | -1.2294188 | 0.2485337 | -0.2706589 | -1.5055993 | -0.7062091 | 0.8060020 | -0.4176853 |

-0.1711396 | NA | -1.2937757 | 0.0984493 | -0.1351536 | 0.3613034 | -0.5861565 | -0.6498191 |

0.4208610 | -0.0102911 | 0.3268812 | NA | 0.9371669 | 0.0886542 | 1.4311793 | -0.1800665 |

1.8674356 | 1.9724127 | 0.3847853 | 0.3058566 | 0.3201818 | NA | -1.2755379 | 1.3359326 |