Multivariate predictive mean matching is a generlised of univariate predictive mean, which can impute incomplete variables simultaneously.

Packages used

The following packages are used.

library(devtools)
install_github("amices/mice")
library(mice)

Data generation

set.seed(123)
B1 <- .5
B2 <- .5
X <- rnorm(1000)
XX <- X^2
e <- rnorm(1000, 0, 1)
Y <- B1 * X + B2 * XX + e
dat <- data.frame(x = X, xx = XX, y = Y)
# Impose 25 percent MCAR Missingness
dat[0 == rbinom(1000, 1, 1 - .25), 1:2] <- NA

Imputation

# Prepare data for imputation
blk <- list(c("x", "xx"), "y")
meth <- c("mpmm", "")
# Impute data
imp <- mice(dat, blocks = blk, method = meth, print = FALSE)

Plot results

# Pool result
pool(with(imp, lm(y ~ x + xx)))

## Class: mipo    m = 5 
##          term m   estimate         ubar            b            t dfcom
## 1 (Intercept) 5 0.04511865 0.0015916063 0.0001560688 0.0017788888   997
## 2           x 5 0.56871385 0.0010725006 0.0005666508 0.0017524816   997
## 3          xx 5 0.50123834 0.0005834871 0.0002448044 0.0008772524   997
##          df       riv    lambda       fmi
## 1 256.78680 0.1176689 0.1052806 0.1121687
## 2  25.45811 0.6340145 0.3880103 0.4310202
## 3  33.84616 0.5034649 0.3348697 0.3709728

plot(dat$x, dat$xx, col = mdc(1), xlab = "x", ylab = "xx")
cmp <- complete(imp)
points(cmp$x[is.na(dat$x)], cmp$xx[is.na(dat$x)], col = mdc(2))

Multivariate predictive mean matching

Mingyang Cai

Packages used

Data generation

Imputation

Plot results