mice
: Imputing multi-level dataThis is the fifth vignette in a series of six.
In this vignette we will focus on multi-level imputation. You need to have package pan
installed. You can install it by running: install.packages("pan")
.
1. Open R
and load the packages mice
, lattice
and pan
.
require(mice)
require(lattice)
require(pan)
set.seed(123)
We choose seed value 123
. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R
instance to 123
, you will get the exact same results and plots as we present in this document.
We are going to work with the popularity data from Joop Hox (2010). The variables in this data set are described as follows:
pupil | Pupil number within class |
class | Class number |
extrav | Pupil extraversion |
sex | Pupil gender |
texp | Teacher experience (years) |
popular | Pupil popularity |
popteach | Teacher popularity |
1. Open the popular.RData
workspace. A workspace with complete and incomplete versions of the popularity data can be obtained here or can be loaded into the Global Environment by running:
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)
This workspace contains several datasets and functions that, when loaded, are available to you in R. If you’d like to see what is inside: run the following code
ls()
## [1] "con" "icc"
## [3] "mice.impute.2lonly.mean2" "popMCAR"
## [5] "popMCAR2" "popNCR"
## [7] "popNCR2" "popNCR3"
## [9] "popular"
The dataset popNCR
is a variation on the Hox (2010) data, where the missingness in the variables is either missing at random (MAR) or missing not at random (MNAR).
2. Check with the functions head()
, dim()
- alternatively one could use nrow()
and ncol()
instead of dim()
- and summary()
how large the dataset is, of which variables the data frame consists and if there are missing values in a variable.
head(popNCR)
## pupil class extrav sex texp popular popteach
## 1 1 1 5 1 NA 6.3 NA
## 2 2 1 NA 0 24 4.9 NA
## 3 3 1 4 1 NA 5.3 6
## 4 4 1 3 <NA> NA 4.7 5
## 5 5 1 5 1 24 NA 6
## 6 6 1 NA 0 NA 4.7 5
dim(popNCR)
## [1] 2000 7
nrow(popNCR)
## [1] 2000
ncol(popNCR)
## [1] 7
summary(popNCR)
## pupil class extrav sex texp
## Min. : 1.00 17 : 26 Min. : 1.000 0 :661 Min. : 2.0
## 1st Qu.: 6.00 63 : 25 1st Qu.: 4.000 1 :843 1st Qu.: 7.0
## Median :11.00 10 : 24 Median : 5.000 NA's:496 Median :12.0
## Mean :10.65 15 : 24 Mean : 5.313 Mean :11.8
## 3rd Qu.:16.00 4 : 23 3rd Qu.: 6.000 3rd Qu.:16.0
## Max. :26.00 21 : 23 Max. :10.000 Max. :25.0
## (Other):1855 NA's :516 NA's :976
## popular popteach
## Min. :0.000 Min. : 1.000
## 1st Qu.:3.900 1st Qu.: 4.000
## Median :4.800 Median : 5.000
## Mean :4.829 Mean : 4.834
## 3rd Qu.:5.800 3rd Qu.: 6.000
## Max. :9.100 Max. :10.000
## NA's :510 NA's :528
The data set has 2000 rows and 7 columns (variables). The variables extrav
, sex
, texp
, popular
and popteach
contain missings. About a quarter of these variables is missing, except for texp
where 50 % is missing.
3. As we have seen before, the function md.pattern()
is used to display all different missing data patterns. How many different missing data patterns are present in the popNCR
dataframe and which patterns occur most frequently in the data? Also find out how many patterns we would observe when variable texp
(teacher experience) is not considered.
md.pattern(popNCR)
## pupil class sex popular extrav popteach texp
## 308 1 1 1 1 1 1 1 0
## 279 1 1 1 1 1 1 0 1
## 110 1 1 1 1 1 0 1 1
## 115 1 1 1 1 1 0 0 2
## 114 1 1 1 1 0 1 1 1
## 98 1 1 1 1 0 1 0 2
## 33 1 1 1 1 0 0 1 2
## 24 1 1 1 1 0 0 0 3
## 119 1 1 1 0 1 1 1 1
## 113 1 1 1 0 1 1 0 2
## 50 1 1 1 0 1 0 1 2
## 75 1 1 1 0 1 0 0 3
## 29 1 1 1 0 0 1 1 2
## 21 1 1 1 0 0 1 0 3
## 2 1 1 1 0 0 0 1 3
## 14 1 1 1 0 0 0 0 4
## 102 1 1 0 1 1 1 1 1
## 89 1 1 0 1 1 1 0 2
## 25 1 1 0 1 1 0 1 2
## 29 1 1 0 1 1 0 0 3
## 85 1 1 0 1 0 1 1 2
## 56 1 1 0 1 0 1 0 3
## 9 1 1 0 1 0 0 1 3
## 14 1 1 0 1 0 0 0 4
## 19 1 1 0 0 1 1 1 2
## 27 1 1 0 0 1 1 0 3
## 13 1 1 0 0 1 0 1 3
## 11 1 1 0 0 1 0 0 4
## 4 1 1 0 0 0 1 1 3
## 9 1 1 0 0 0 1 0 4
## 2 1 1 0 0 0 0 1 4
## 2 1 1 0 0 0 0 0 5
## 0 0 496 510 516 528 976 3026
There are 32 unique patterns. The pattern where everything is observed and the pattern where only texp is missing occur most frequently.
If we omit texp, then the following pattern matrix is realized:
md.pattern(popNCR[ , -5])
## pupil class sex popular extrav popteach
## 587 1 1 1 1 1 1 0
## 225 1 1 1 1 1 0 1
## 212 1 1 1 1 0 1 1
## 57 1 1 1 1 0 0 2
## 232 1 1 1 0 1 1 1
## 125 1 1 1 0 1 0 2
## 50 1 1 1 0 0 1 2
## 16 1 1 1 0 0 0 3
## 191 1 1 0 1 1 1 1
## 54 1 1 0 1 1 0 2
## 141 1 1 0 1 0 1 2
## 23 1 1 0 1 0 0 3
## 46 1 1 0 0 1 1 2
## 24 1 1 0 0 1 0 3
## 13 1 1 0 0 0 1 3
## 4 1 1 0 0 0 0 4
## 0 0 496 510 516 528 2050
Without texp, there are only 16 patterns.
4. Let’s focus more precisely on the missing data patterns. Does the missing data of popular
depend on popteach
? One could for example check this by making a histogram of popteach
separately for the pupils with known popularity and missing popularity.
In R the missingness indicator
is.na(popNCR$popular)
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
## [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [25] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [73] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
## [85] FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [121] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [181] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [205] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [217] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [253] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [277] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [289] FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
## [301] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE
## [337] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE
## [361] FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [373] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [385] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [397] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [409] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [421] FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [445] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
## [457] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [469] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [481] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [493] FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
## [505] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [517] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
## [529] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [541] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
## [553] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [565] TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
## [577] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
## [589] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [601] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [613] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [625] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [637] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [649] TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
## [661] TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [673] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [685] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [697] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [709] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [721] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## [733] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE
## [745] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [757] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## [769] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
## [781] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
## [793] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [805] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [817] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [829] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [841] TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [853] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
## [865] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [877] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [889] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [901] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
## [913] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [925] FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
## [937] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [949] FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [961] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [973] FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
## [985] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [997] TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
## [1009] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1021] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1033] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [1045] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE
## [1057] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1069] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1081] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1093] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [1105] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
## [1117] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1141] FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
## [1153] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1165] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [1177] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1189] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1201] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [1213] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
## [1237] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1249] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1261] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [1273] TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
## [1285] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [1297] FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
## [1309] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [1321] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
## [1333] TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
## [1345] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1357] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1369] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
## [1381] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
## [1393] FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [1405] TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
## [1417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [1429] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1441] TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1453] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1465] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [1477] TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1489] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1501] TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1513] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
## [1525] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1537] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1549] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [1561] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
## [1573] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1585] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1597] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1609] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1621] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1633] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1645] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
## [1657] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## [1669] TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [1681] FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
## [1693] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [1705] FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
## [1717] FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
## [1729] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1741] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1753] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [1765] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [1777] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [1789] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [1801] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1813] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [1825] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1837] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## [1849] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1861] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1873] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1885] FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
## [1897] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
## [1909] FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## [1921] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1933] TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [1945] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
## [1957] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
## [1969] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [1981] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [1993] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
is a dummy variable of the same length as popular
with value 0 (FALSE
) for observed pupil popularity and 1 (TRUE
) for missing pupil popularity. A histogram can be made with the function histogram()
. The code for a conditional histogram of popteach
given the missingness indicator for popular is
histogram(~ popteach | is.na(popular), data=popNCR)
We do see that the histogram for the missing popular
(TRUE
) is further to the right than the histogram for observed popular
(FALSE
). This would indicate a right-tailed MAR missingness. In fact this is exactly what happens, because we created the missingness in these data ourselves. But we can make it observable by examining the relations between the missingness in popular
and the observed data in popteach
.
5. Does the missingness of the other incomplete variables depend on popteach
? If yes, what is the direction of the relation?
histogram(~ popteach | is.na(sex), data = popNCR)
There seems to be a left-tailed relation between popteach
and the missingness in sex
.
histogram(~ popteach | is.na(extrav), data = popNCR)
There also seems to be a left-tailed relation between popteach
and the missingness in extrav
.
histogram(~ popteach | is.na(texp), data = popNCR)
There seems to be no observable relation between popteach
and the missingness in texp
. It might be MCAR or even MNAR.
6. Find out if the missingness in teacher popularity depends on pupil popularity.
histogram(~ popular | is.na(popteach), data = popNCR)
Yes: there is a dependency. The relation seems to be right-tailed.
7. Have a look at the intraclass correlation (ICC) for the incomplete variables popular
, popteach
and texp
.
icc(aov(popular ~ as.factor(class), data = popNCR))
## [1] 0.328007
icc(aov(popteach ~ class, data = popNCR))
## [1] 0.3138658
icc(aov(texp ~ class, data = popNCR))
## [1] 1
Please note that the function icc()
comes from the package multilevel
(function ICC1()
), but is included in the workspace popular.RData
. Write down the ICCs, you’ll need them later.
7b. Do you think it is necessary to take the multilevel structure into account?
YES! There is a strong cluster structure going on. If we ignore the clustering in our imputation model, we may run into invalid inference. To stay as close to the true data model, we must take the cluster structure into account during imputation.
8. Impute the popNCR
dataset with mice
using imputation method norm
for popular
, popteach
, texp
and extrav
. Exclude class
as a predictor for all variables. Call the mids
-object imp1
.
ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth
## pupil class extrav sex texp popular popteach
## "" "" "pmm" "logreg" "pmm" "pmm" "pmm"
meth[c(3, 5, 6, 7)] <- "norm"
meth
## pupil class extrav sex texp popular popteach
## "" "" "norm" "logreg" "norm" "norm" "norm"
pred <- ini$pred
pred
## pupil class extrav sex texp popular popteach
## pupil 0 1 1 1 1 1 1
## class 1 0 1 1 1 1 1
## extrav 1 1 0 1 1 1 1
## sex 1 1 1 0 1 1 1
## texp 1 1 1 1 0 1 1
## popular 1 1 1 1 1 0 1
## popteach 1 1 1 1 1 1 0
pred[, "class"] <- 0
pred[, "pupil"] <- 0
pred
## pupil class extrav sex texp popular popteach
## pupil 0 0 1 1 1 1 1
## class 0 0 1 1 1 1 1
## extrav 0 0 0 1 1 1 1
## sex 0 0 1 0 1 1 1
## texp 0 0 1 1 0 1 1
## popular 0 0 1 1 1 0 1
## popteach 0 0 1 1 1 1 0
imp1 <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
9. Compare the means of the variables in the first imputed dataset and in the incomplete dataset.
summary(complete(imp1))
## pupil class extrav sex texp
## Min. : 1.00 17 : 26 Min. : 1.000 0: 985 Min. :-6.465
## 1st Qu.: 6.00 63 : 25 1st Qu.: 4.139 1:1015 1st Qu.: 8.000
## Median :11.00 10 : 24 Median : 5.000 Median :12.253
## Mean :10.65 15 : 24 Mean : 5.269 Mean :12.509
## 3rd Qu.:16.00 4 : 23 3rd Qu.: 6.000 3rd Qu.:16.698
## Max. :26.00 21 : 23 Max. :10.000 Max. :35.745
## (Other):1855
## popular popteach
## Min. : 0.000 Min. : 1.000
## 1st Qu.: 4.100 1st Qu.: 4.000
## Median : 5.000 Median : 5.000
## Mean : 5.006 Mean : 5.021
## 3rd Qu.: 5.971 3rd Qu.: 6.000
## Max. :10.547 Max. :10.000
##
summary(popNCR)
## pupil class extrav sex texp
## Min. : 1.00 17 : 26 Min. : 1.000 0 :661 Min. : 2.0
## 1st Qu.: 6.00 63 : 25 1st Qu.: 4.000 1 :843 1st Qu.: 7.0
## Median :11.00 10 : 24 Median : 5.000 NA's:496 Median :12.0
## Mean :10.65 15 : 24 Mean : 5.313 Mean :11.8
## 3rd Qu.:16.00 4 : 23 3rd Qu.: 6.000 3rd Qu.:16.0
## Max. :26.00 21 : 23 Max. :10.000 Max. :25.0
## (Other):1855 NA's :516 NA's :976
## popular popteach
## Min. :0.000 Min. : 1.000
## 1st Qu.:3.900 1st Qu.: 4.000
## Median :4.800 Median : 5.000
## Mean :4.829 Mean : 4.834
## 3rd Qu.:5.800 3rd Qu.: 6.000
## Max. :9.100 Max. :10.000
## NA's :510 NA's :528
9b. The missingness in texp
is MNAR: higher values for texp
have a larger probability to be missing. Can you see this in the imputed data? Do you think this is a problem?
Yes, we can see this in the imputed data: teacher experience increases slightly after imputation. However, texp
is the same for all pupils in a class. But not all pupils have this information recorded (as if some pupils did not remember, or were not present during data collection). This is not a problem, because as long as at least one pupil in each class has teacher experience recorded, we can deductively impute the correct (i.e. true) value for every pupil in the class.
10. Compare the ICCs of the variables in the first imputed dataset with those in the incomplete dataset (use popular
, popteach
and texp
). Make a notation of the ICCs after imputation.
data.frame(vars = names(popNCR[c(6, 7, 5)]),
observed = c(icc(aov(popular ~ class, popNCR)),
icc(aov(popteach ~ class, popNCR)),
icc(aov(texp ~ class, popNCR))),
norm = c(icc(aov(popular ~ class, complete(imp1))),
icc(aov(popteach ~ class, complete(imp1))),
icc(aov(texp ~ class, complete(imp1)))))
## vars observed norm
## 1 popular 0.3280070 0.2798518
## 2 popteach 0.3138658 0.2639095
## 3 texp 1.0000000 0.4595004
11. Now impute the popNCR
dataset again with mice
using imputation method norm
for popular
, popteach
, texp
and extrav
, but now include class
as a predictor for all variables. Call the mids
-object imp2
.
pred <- ini$pred
pred[, "pupil"] <- 0
imp2 <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
## Warning: Number of logged events: 90
12. Compare the ICCs of the variables in the first imputed dataset from imp2
with those of imp1
and the incomplete dataset (use popular
, popteach
and texp
). Make a notation of the ICCs after imputation.
data.frame(vars = names(popNCR[c(6, 7, 5)]),
observed = c(icc(aov(popular ~ class, popNCR)),
icc(aov(popteach ~ class, popNCR)),
icc(aov(texp ~ class, popNCR))),
norm = c(icc(aov(popular ~ class, complete(imp1))),
icc(aov(popteach ~ class, complete(imp1))),
icc(aov(texp ~ class, complete(imp1)))),
normclass = c(icc(aov(popular ~ class, complete(imp2))),
icc(aov(popteach ~ class, complete(imp2))),
icc(aov(texp ~ class, complete(imp2)))))
## vars observed norm normclass
## 1 popular 0.3280070 0.2798518 0.3629046
## 2 popteach 0.3138658 0.2639095 0.3326133
## 3 texp 1.0000000 0.4595004 1.0000000
By simply forcing the algorithm to use the class variable during estimation we adopt a fixed effects approach. This conforms to formulating seperate regression models for each class
and imputing within classes from these models.
13. Inspect the trace lines for the variables popular
, texp
and extrav
.
plot(imp2, c("popular", "texp", "popteach"))
14. Add another 10 iterations and inspect the trace lines again. What do you observe with respect to the convergence of the sampler?
imp3 <- mice.mids(imp2, maxit = 10)
##
## iter imp variable
## 6 1 extrav sex texp popular popteach
## 6 2 extrav sex texp popular popteach
## 6 3 extrav sex texp popular popteach
## 6 4 extrav sex texp popular popteach
## 6 5 extrav sex texp popular popteach
## 7 1 extrav sex texp popular popteach
## 7 2 extrav sex texp popular popteach
## 7 3 extrav sex texp popular popteach
## 7 4 extrav sex texp popular popteach
## 7 5 extrav sex texp popular popteach
## 8 1 extrav sex texp popular popteach
## 8 2 extrav sex texp popular popteach
## 8 3 extrav sex texp popular popteach
## 8 4 extrav sex texp popular popteach
## 8 5 extrav sex texp popular popteach
## 9 1 extrav sex texp popular popteach
## 9 2 extrav sex texp popular popteach
## 9 3 extrav sex texp popular popteach
## 9 4 extrav sex texp popular popteach
## 9 5 extrav sex texp popular popteach
## 10 1 extrav sex texp popular popteach
## 10 2 extrav sex texp popular popteach
## 10 3 extrav sex texp popular popteach
## 10 4 extrav sex texp popular popteach
## 10 5 extrav sex texp popular popteach
## 11 1 extrav sex texp popular popteach
## 11 2 extrav sex texp popular popteach
## 11 3 extrav sex texp popular popteach
## 11 4 extrav sex texp popular popteach
## 11 5 extrav sex texp popular popteach
## 12 1 extrav sex texp popular popteach
## 12 2 extrav sex texp popular popteach
## 12 3 extrav sex texp popular popteach
## 12 4 extrav sex texp popular popteach
## 12 5 extrav sex texp popular popteach
## 13 1 extrav sex texp popular popteach
## 13 2 extrav sex texp popular popteach
## 13 3 extrav sex texp popular popteach
## 13 4 extrav sex texp popular popteach
## 13 5 extrav sex texp popular popteach
## 14 1 extrav sex texp popular popteach
## 14 2 extrav sex texp popular popteach
## 14 3 extrav sex texp popular popteach
## 14 4 extrav sex texp popular popteach
## 14 5 extrav sex texp popular popteach
## 15 1 extrav sex texp popular popteach
## 15 2 extrav sex texp popular popteach
## 15 3 extrav sex texp popular popteach
## 15 4 extrav sex texp popular popteach
## 15 5 extrav sex texp popular popteach
plot(imp3, c("popular", "texp", "popteach"))