You can find all related materials, links and references there.
Recent advancements in iterative imputation
You can find all related materials, links and references there.
For those of you who are unfamiliar with imputation:
With imputation, some estimation procedure is used to impute (fill in) each missing datum, resulting in a completed dataset that can be analyzed as if the data were completely observed.
We can do this once (single imputation) or multiple times (multiple imputation).
Multiple imputation (Rubin, 1987) has some benefits over single imputation:
Once we start the process of multiple imputation, we need a scheme to solve for multivariate missingness
Some notation:
Let \(Y\) be an incomplete column in the data
Let \(X\) be a set of completely observed covariates
In general, there are two flavours of multiple imputation:
With JM, imputations are drawn from an assumed joint multivariate distribution.
Joint modeling imputations generated under the normal model are usually robust to misspecification of the imputation model (Schafer, 1997; Demirtas et al., 2008), although transformation towards normality is generally beneficial.
PRO
CON
Multiple imputation by means of FCS does not start from an explicit multivariate model.
With FCS, multivariate missing data is imputed by univariately specifying an imputation model for each incomplete variable, conditional on a set of other (possibly incomplete) variables.
The general idea of using conditionally specified models to deal with missing data has been discussed and applied by many authors
Comparisons between JM and FCS have been made that indicate that FCS is a useful and flexible alternative to JM when the joint distribution of the data is not easily specified (Van Buuren, 2007) and that similar results may be expected from both imputation approaches (Lee and Carlin, 2010).
miceSpecify the imputation models \(P(Y_j^\mathrm{mis} | Y_j^\mathrm{obs}, Y_{-j}, X)\)
And iterate
PRO
CON
Conclusion:
\[\text{Merging JM and FCS would be better}\]
We can combine the flexibility of FCS with the appealing theoretical properties of JM
In order to do so, we need to partition the variables into blocks
For example, we might partition \(b\) blocks \(h = 1,\dots,b\) as follows
a quadruppel block with \(b=4\) would be the mice algorithm \[\{Y_1\},\{Y_2\},\{Y_3\},\{Y_4\}, X\]
anything in between would be a hybrid between the joint model and the mice model. For example, \[\{Y_1, Y_2, Y_3\},\{Y_4\}, X\]
Just some examples where a hybrid imputation procedure would be useful:
\[ \begin{array}{lllllllllllll} x_0 &= &x_1 &+ &x_2 &+ &x_3 &+& x_4 & & & &\\ & &= & & & & && = & & & &\\ & &x_9 & & & & && x_5 & & & &\\ & &+ & & & & && + & & & &\\ & &x_{10} & & & & &&x_6 &= &x_7 &+&x_8 \end{array} \]
| b | h | target | predictors | type |
|---|---|---|---|---|
| 2 | 1 | \(\{Y_1, Y_2, Y_3\}\) | \(Y_4, X\) | mult |
| 2 | 2 | \(Y_4\) | \(Y_1, Y_2, Y_3, X\) | univ |
\[\quad\] The above table details \(b=2\) blocks.
The first block considers the multivariate imputation of the set \((Y_1, Y_2, Y_3)\). The second block considers the univariate imputation of the remaining column \(Y_4\).
With FCS, the scheme on the previous slide would take the following embedded structure:
| b | h | j | target | predictors | type |
|---|---|---|---|---|---|
| 2 | 1 | 1 | \(Y_1\) | \(Y_2, Y_3, Y_4, X\) | univ |
| 2 | 1 | 2 | \(Y_2\) | \(Y_1, Y_3, Y_4, X\) | univ |
| 2 | 1 | 3 | \(Y_3\) | \(Y_1, Y_2, Y_4, X\) | univ |
| 2 | 2 | 1 | \(Y_4\) | \(Y_1, Y_2, Y_3, X\) | univ |
\[\quad\]
The first block is a FCS loop within an FCS imputation procedure.
mice()predictorMatrix simplification:Under the conventional FCS predictor specification, we could hypothesize the following predictorMatrix.
## age item1 item2 sum_items time1 time2 time3 mean_time ## age 0 0 0 1 0 0 0 1 ## item1 1 0 1 0 0 0 0 1 ## item2 1 1 0 0 0 0 0 1 ## sum_items 0 1 1 0 0 0 0 0 ## time1 1 0 0 1 0 1 1 0 ## time2 1 0 0 1 1 0 1 0 ## time3 1 0 0 1 1 1 0 0 ## mean_time 0 0 0 0 1 1 1 0
predictorMatrix simplification:Under the new blocked approach, however, we could simplify these specifications into the following blocks and predictor relations.
blocks <- list(age = "age",
A = c("item1", "item2", "sum_items"),
B = c("time1", "time2", "time3", "mean_time"))
## age item1 item2 sum_items time1 time2 time3 mean_time ## age 0 0 0 1 0 0 0 1 ## Items 1 0 0 0 0 0 0 1 ## Time 1 0 0 1 0 0 0 0
brandsmaThe brandsma dataset (Snijders and Bosker, 2012) contains data from 4106 pupils in 216 schools.
d <- brandsma %>% select(sch, lpo, iqv, den) head(d)
## sch lpo iqv den ## 1 1 NA -1.3535094 1 ## 2 1 50 2.1464906 1 ## 3 1 46 3.1464906 1 ## 4 1 45 2.6464906 1 ## 5 1 33 -2.3535094 1 ## 6 1 46 -0.8535094 1
The scientific interest is to create a model for predicting the outcome lpo from the level-1 predictor iqv and the measured level-2 predictor den (which takes values 1-4). For pupil \(i\) in school \(c\) in composition notation:
\[lpo_{ic} = \beta_0 + \beta_1\mathrm{iqv}_{ic} + \beta_2\mathrm{den}_c + \upsilon_{0c}+ \epsilon_{ic}\] where \(\epsilon_{ic} \sim \mathcal{N}(0, \sigma_\epsilon^2)\) and \(\upsilon_{0c} = \mathcal{N}(0, \sigma_\upsilon^2)\)
micemeth <- make.method(d)
meth[c("lpo", "iqv", "den")] <- c("2l.pmm", "2l.pmm", "2lonly.pmm")
meth
## sch lpo iqv den ## "" "2l.pmm" "2l.pmm" "2lonly.pmm"
pred <- make.predictorMatrix(d) pred["lpo", ] <- c(-2, 0, 3, 1) # -2 denotes cluster identifier pred["iqv", ] <- c(-2, 3, 0, 1) # 3 denotes including the covariate's cluster mean pred["den", ] <- c(-2, 1, 1, 0) # 1 denotes fixed effects predictor pred
## sch lpo iqv den ## sch 0 1 1 1 ## lpo -2 0 3 1 ## iqv -2 3 0 1 ## den -2 1 1 0
imp <- mice(d, pred = pred, meth = meth, seed = 123, m = 10, print = FALSE)
blocksWe use the function mitml::jomoImpute and call it from within mice
d$den <- as.factor(d$den)
block <- make.blocks(d, "collect") # assign all vars to a single block
formula <- list(collect = list(lpo + iqv ~ 1 + (1 | sch),
den ~ 1))
We parse the block and formula objects to their respective arguments in the mice function
imp <- mice(d, meth = "jomoImpute", blocks = block,
form = formula, print = FALSE, seed = 1,
maxit = 2, m = 10, n.burn = 100)
Imputing column in blocks is a straigtforward extension to the FCS algorithm
mice in RTo do
Potentially interesting
Integrate the hybrid (JM/FCS) approach with the blocked sequential regression multivariate imputation approach by Zhu (2016), which in turn generalizes the monotone block approach of Li et al. (2014).
The multilevel example is detailed in Van Buuren (2018), Chapter 7.10.4
See www.gerkovink.com/London2019/ for an detailed overview of all references.