Hybrid imputation

Recent advancements in iterative imputation

This presentation

has a website:

www.gerkovink.com/London2019/

You can find all related materials, links and references there.

A short overview

Imputation

For those of you who are unfamiliar with imputation:

With imputation, some estimation procedure is used to impute (fill in) each missing datum, resulting in a completed dataset that can be analyzed as if the data were completely observed.

We can do this once (single imputation) or multiple times (multiple imputation).

With MI, each missing datum is imputed \(m \geq 2\) times, resulting in \(m\) completed datasets.

Multiple imputation (Rubin, 1987) has some benefits over single imputation:

it accounts for missing data uncertainty
it accounts for parameter uncertainty
can yield valid inference without additional adjustments

How to go about this?

Once we start the process of multiple imputation, we need a scheme to solve for multivariate missingness

Some notation:

Let \(Y\) be an incomplete column in the data
- \(Y_\mathrm{mis}\) denoting the unobserved part
- \(Y_\mathrm{obs}\) denotes the observed part
Let \(X\) be a set of completely observed covariates

In general, there are two flavours of multiple imputation:

We can either model the joint distribution of the data by means of joint modeling
Or, we can model each variable separately by means of fully conditional specification

Joint modeling

With JM, imputations are drawn from an assumed joint multivariate distribution.

Often a multivariate normal model is used for both continuous and categorical data,
Other joint models have been proposed (see e.g. Olkin and Tate, 1961; Van Buuren and van Rijckevorsel, 1992; Schafer, 1997; Van Ginkel et al., 2007; Goldstein et al., 2009; Chen et al., 2011).

Joint modeling imputations generated under the normal model are usually robust to misspecification of the imputation model (Schafer, 1997; Demirtas et al., 2008), although transformation towards normality is generally beneficial.

Procedure

Specify the joint model \(P(Y,X)\)
Derive \(P(Y_\mathrm{mis}|Y_\mathrm{obs},X)\)
Draw imputations \(\dot Y^\mathrm{mis}\) with a Gibbs sampler

Joint modeling

PRO

The conditionals are compatible
The statistical inference is correct under the assumed joint model
Efficient parametrization is possible
The theoretical properties are known

CON

Having to specify a joint model impacts flexibility
The JM can assume more than the complete data problem
It can lead to unrealistically large models
The assumed model may not be very close to the data

FCS

Multiple imputation by means of FCS does not start from an explicit multivariate model.

With FCS, multivariate missing data is imputed by univariately specifying an imputation model for each incomplete variable, conditional on a set of other (possibly incomplete) variables.

the multivariate distribution for the data is thereby implicitly specified through the univariate conditional densities
imputations are obtained by iterating over the conditionally specified imputation models.

Procedure

Specify \(P(Y^\mathrm{mis} | Y^\mathrm{obs}, X)\)
Draw imputations \(\dot Y^\mathrm{mis}\) with Gibbs sampler

FCS

The general idea of using conditionally specified models to deal with missing data has been discussed and applied by many authors

see e.g. Kennickell, 1991; Raghunathan and Siscovick, 1996; Oudshoorn et al., 1999; Brand, 1999; Van Buuren et al., 1999; Van Buuren and Oudshoorn, 2000; Raghunathan et al., 2001; Faris et al., 2002; Van Buuren et al., 2006.

Comparisons between JM and FCS have been made that indicate that FCS is a useful and flexible alternative to JM when the joint distribution of the data is not easily specified (Van Buuren, 2007) and that similar results may be expected from both imputation approaches (Lee and Carlin, 2010).

FCS in `mice`

Specify the imputation models \(P(Y_j^\mathrm{mis} | Y_j^\mathrm{obs}, Y_{-j}, X)\)
- where \(Y_{−j}\) is the set of incomplete variables except \(Y_j\)
Fill in starting values for the missing data
And iterate

Why I prefer FCS

PRO

FCS is very flexible
modeling remains close to the data
one may use a subset of predictors for each column
work very well in practice
straightforward to explain to applied researchers

CON

its theoretical properties are only known in special cases
potential incompatibility of the collection of conditionals with the joint
no computational shortcuts

Conclusion:

\[\text{Merging JM and FCS would be better}\]

Merge JM and FCS

Hybrids of JM and FCS

We can combine the flexibility of FCS with the appealing theoretical properties of JM

In order to do so, we need to partition the variables into blocks

For example, we might partition \(b\) blocks \(h = 1,\dots,b\) as follows
- a single block with \(b=1\) would hold a joint model: \[\{Y_1, Y_2, Y_3, Y_4\}, X\]
- a quadruppel block with \(b=4\) would be the mice algorithm \[\{Y_1\},\{Y_2\},\{Y_3\},\{Y_4\}, X\]
- anything in between would be a hybrid between the joint model and the mice model. For example, \[\{Y_1, Y_2, Y_3\},\{Y_4\}, X\]

Why is this useful

Just some examples where a hybrid imputation procedure would be useful:

Imputing squares/nonlinear effects: In the model \(y=\alpha + \beta_1X+\beta_2X^2 + \epsilon\), \(X\) and \(X^2\) should be imputed jointly (Von Hippel, 2009, Seaman, Bartlett & White, 2012, Vink & Van Buuren, 2013, Bartlett et al., 2015)
Compositional data: Predictive ratio matching (Vink, 2015, Ch5)

\[ \begin{array}{lllllllllllll} x_0 &= &x_1 &+ &x_2 &+ &x_3 &+& x_4 & & & &\\ & &= & & & & && = & & & &\\ & &x_9 & & & & && x_5 & & & &\\ & &+ & & & & && + & & & &\\ & &x_{10} & & & & &&x_6 &= &x_7 &+&x_8 \end{array} \]

Multivariate PMM: Imputing a combination of outcomes optimally based on a linear combination of covariates (Cai, Vink & Van Buuren - working paper).

JM embedded within FCS

b	h	target	predictors	type
2	1	\(\{Y_1, Y_2, Y_3\}\)	\(Y_4, X\)	mult
2	2	\(Y_4\)	\(Y_1, Y_2, Y_3, X\)	univ

\[\quad\] The above table details \(b=2\) blocks.

The first block considers the multivariate imputation of the set \((Y_1, Y_2, Y_3)\). The second block considers the univariate imputation of the remaining column \(Y_4\).

FCS embedded within FCS

With FCS, the scheme on the previous slide would take the following embedded structure:

b	h	j	target	predictors	type
2	1	1	\(Y_1\)	\(Y_2, Y_3, Y_4, X\)	univ
2	1	2	\(Y_2\)	\(Y_1, Y_3, Y_4, X\)	univ
2	1	3	\(Y_3\)	\(Y_1, Y_2, Y_4, X\)	univ
2	2	1	\(Y_4\)	\(Y_1, Y_2, Y_3, X\)	univ

\[\quad\]

The first block is a FCS loop within an FCS imputation procedure.

Benefits of blocks in `mice()`

Looping over \(b\) blocks instead of looping over \(p\) columns.
Only specify \(b \times p\) predictor relations and not \(p^2\).
Only specify \(b\) univariate imputation methods instead of \(p\) methods.
Ability for imputing more than one column at once
Simplified overall model specification

e.g. sets of items in scales, matching items in longitudinal data, joining data sets, etc.

`predictorMatrix` simplification:

Under the conventional FCS predictor specification, we could hypothesize the following predictorMatrix.

##           age item1 item2 sum_items time1 time2 time3 mean_time
## age         0     0     0         1     0     0     0         1
## item1       1     0     1         0     0     0     0         1
## item2       1     1     0         0     0     0     0         1
## sum_items   0     1     1         0     0     0     0         0
## time1       1     0     0         1     0     1     1         0
## time2       1     0     0         1     1     0     1         0
## time3       1     0     0         1     1     1     0         0
## mean_time   0     0     0         0     1     1     1         0

`predictorMatrix` simplification:

Under the new blocked approach, however, we could simplify these specifications into the following blocks and predictor relations.

blocks <- list(age = "age", 
               A = c("item1", "item2", "sum_items"), 
               B = c("time1", "time2", "time3", "mean_time"))

##       age item1 item2 sum_items time1 time2 time3 mean_time
## age     0     0     0         1     0     0     0         1
## Items   1     0     0         0     0     0     0         1
## Time    1     0     0         1     0     0     0         0

An example: `brandsma`

The brandsma dataset (Snijders and Bosker, 2012) contains data from 4106 pupils in 216 schools.

d <- brandsma %>% select(sch, lpo, iqv, den)
head(d)

##   sch lpo        iqv den
## 1   1  NA -1.3535094   1
## 2   1  50  2.1464906   1
## 3   1  46  3.1464906   1
## 4   1  45  2.6464906   1
## 5   1  33 -2.3535094   1
## 6   1  46 -0.8535094   1

The scientific interest is to create a model for predicting the outcome lpo from the level-1 predictor iqv and the measured level-2 predictor den (which takes values 1-4). For pupil \(i\) in school \(c\) in composition notation:

\[lpo_{ic} = \beta_0 + \beta_1\mathrm{iqv}_{ic} + \beta_2\mathrm{den}_c + \upsilon_{0c}+ \epsilon_{ic}\] where \(\epsilon_{ic} \sim \mathcal{N}(0, \sigma_\epsilon^2)\) and \(\upsilon_{0c} = \mathcal{N}(0, \sigma_\upsilon^2)\)

Normally in `mice`

meth <- make.method(d)
meth[c("lpo", "iqv", "den")] <- c("2l.pmm", "2l.pmm", "2lonly.pmm")
meth

##          sch          lpo          iqv          den 
##           ""     "2l.pmm"     "2l.pmm" "2lonly.pmm"

pred <- make.predictorMatrix(d)
pred["lpo", ] <- c(-2, 0, 3, 1) # -2 denotes cluster identifier
pred["iqv", ] <- c(-2, 3, 0, 1) # 3 denotes including the covariate's cluster mean
pred["den", ] <- c(-2, 1, 1, 0) # 1 denotes fixed effects predictor
pred

##     sch lpo iqv den
## sch   0   1   1   1
## lpo  -2   0   3   1
## iqv  -2   3   0   1
## den  -2   1   1   0

imp <- mice(d, pred = pred, meth = meth, seed = 123, m = 10, print = FALSE)

With `blocks`

We use the function mitml::jomoImpute and call it from within mice

d$den <- as.factor(d$den)
block <- make.blocks(d, "collect") # assign all vars to a single block
formula <- list(collect = list(lpo + iqv ~ 1 + (1 | sch),
                               den ~ 1))

We parse the block and formula objects to their respective arguments in the mice function

imp <- mice(d, meth = "jomoImpute", blocks = block,
            form = formula, print = FALSE, seed = 1,
            maxit = 2, m = 10, n.burn = 100)

Conclusion

Imputing column in blocks is a straigtforward extension to the FCS algorithm

Fully implemented in mice in R
Adding blocks allows for more flexibility in cases where multivariate imputation would lead to to better inference
Using blocks allows the modeler to remain closer to the observed data
It is simple to specify hybrid models

To do

We’re still working on the documentation that details a variety of use cases

Potentially interesting

Integrate the hybrid (JM/FCS) approach with the blocked sequential regression multivariate imputation approach by Zhu (2016), which in turn generalizes the monotone block approach of Li et al. (2014).

References

The multilevel example is detailed in Van Buuren (2018), Chapter 7.10.4

See www.gerkovink.com/London2019/ for an detailed overview of all references.

This presentation

has a website:

A short overview

Imputation

How to go about this?

Joint modeling

Procedure

Joint modeling

FCS

Procedure

FCS

FCS in mice

Why I prefer FCS

Merge JM and FCS

Hybrids of JM and FCS

Why is this useful

JM embedded within FCS

FCS embedded within FCS

Benefits of blocks in mice()

predictorMatrix simplification:

predictorMatrix simplification:

An example: brandsma

Normally in mice

With blocks

Conclusion

References

FCS in `mice`

Benefits of blocks in `mice()`

`predictorMatrix` simplification:

`predictorMatrix` simplification:

An example: `brandsma`

Normally in `mice`

With `blocks`