mice
: Passive imputation and Post-processingThis is the fourth vignette in a series of six.
In this vignette we will walk you through the more advanced features of mice
, such as post-processing of imputations and passive imputation.
1. Open R
and load the packages mice
and lattice
.
require(mice)
require(lattice)
set.seed(123)
We choose seed value 123
. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R
instance to 123
, you will get the exact same results and plots as we present in this document.
Passive Imputation
There is often a need for transformed, combined or recoded versions of the data. In the case of incomplete data, one could impute the original, and transform the completed original afterwards, or transform the incomplete original and impute the transformed version. If, however, both the original and the transformed version are needed within the imputation algorithm, neither of these approaches work: One cannot be sure that the transformation holds between the imputed values of the original and transformed versions. mice
has a built-in approach, called passive imputation, to deal with situations as described above. The goal of passive imputation is to maintain the consistency among different transformations of the same data. As an example, consider the following deterministic function in the boys
data \[\text{BMI} = \frac{\text{Weight (kg)}}{\text{Height}^2 \text{(m)}}\] or the compositional relation in the mammalsleep data: \[\text{ts} = \text{ps}+\text{sws}\]
2. Use passive imputation to impute the deterministic sleep relation in the mammalsleep
data. Name the new multiply imputed dataset pas.imp
.
ini <- mice(mammalsleep[, -1], maxit=0, print=F)
meth<- ini$meth
meth
## bw brw sws ps ts mls gt pi sei odi
## "" "" "pmm" "pmm" "pmm" "pmm" "pmm" "" "" ""
pred <- ini$pred
pred
## bw brw sws ps ts mls gt pi sei odi
## bw 0 1 1 1 1 1 1 1 1 1
## brw 1 0 1 1 1 1 1 1 1 1
## sws 1 1 0 1 1 1 1 1 1 1
## ps 1 1 1 0 1 1 1 1 1 1
## ts 1 1 1 1 0 1 1 1 1 1
## mls 1 1 1 1 1 0 1 1 1 1
## gt 1 1 1 1 1 1 0 1 1 1
## pi 1 1 1 1 1 1 1 0 1 1
## sei 1 1 1 1 1 1 1 1 0 1
## odi 1 1 1 1 1 1 1 1 1 0
pred[c("sws", "ps"), "ts"] <- 0
pred
## bw brw sws ps ts mls gt pi sei odi
## bw 0 1 1 1 1 1 1 1 1 1
## brw 1 0 1 1 1 1 1 1 1 1
## sws 1 1 0 1 0 1 1 1 1 1
## ps 1 1 1 0 0 1 1 1 1 1
## ts 1 1 1 1 0 1 1 1 1 1
## mls 1 1 1 1 1 0 1 1 1 1
## gt 1 1 1 1 1 1 0 1 1 1
## pi 1 1 1 1 1 1 1 0 1 1
## sei 1 1 1 1 1 1 1 1 0 1
## odi 1 1 1 1 1 1 1 1 1 0
meth["ts"]<- "~ I(sws + ps)"
pas.imp <- mice(mammalsleep[, -1], meth=meth, pred=pred, maxit=10, seed=123, print=F)
We used a custom predictor matrix and method vector to tailor our imputation approach to the passive imputation problem. We made sure to exclude ts
as a predictor for the imputation of sws
and ps
to avoid circularity.
We also gave the imputation algorithm 10 iterations to converge and fixed the seed to 123
for this mice
instance. This means that even when people do not fix the overall R
seed for a session, exact replication of results can be obtained by simply fixing the seed
for the random number generator within mice
. Naturally, the same input (data) is each time required to yield the same output (mids
-object).
3. Inspect the trace lines for pas.imp
.
plot(pas.imp)