This is the fifth vignette in a series of six.

In this vignette we will focus on multi-level imputation. You need to have package pan installed. You can install it by running: install.packages("pan").


1. Open R and load the packages mice, lattice and pan.

require(mice)
require(lattice)
require(pan)
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document.


We are going to work with the popularity data from Joop Hox (2010). The variables in this data set are described as follows:

pupil Pupil number within class
class Class number
extrav Pupil extraversion
sex Pupil gender
texp Teacher experience (years)
popular Pupil popularity
popteach Teacher popularity

Inspection of the incomplete data

1. Open the popular.RData workspace. From your working directory, run

load("popular.RData")

or specify the path. For example,

load("C:/Users/youraccount/Documents/popular.RData")

Alternatively, just click the popular.RData file to open the data.

This workspace contains several datasets and functions that, when loaded, are available to you in R. If you’d like to see what is inside: run the following code

ls()
## [1] "icc"                      "mice.impute.2lonly.mean2"
## [3] "popMCAR"                  "popMCAR2"                
## [5] "popNCR"                   "popNCR2"                 
## [7] "popNCR3"                  "popular"

The dataset popNCR is a variation on the Hox (2010) data, where the missingness in the variables is either missing at random (MAR) or missing not at random (MNAR).


2. Check with the functions head(), dim() - alternatively one could use nrow() and ncol() instead of dim() - and summary() how large the dataset is, of which variables the data frame consists and if there are missing values in a variable.

head(popNCR)
##   pupil class extrav  sex texp popular popteach
## 1     1     1      5    1   NA     6.3       NA
## 2     2     1     NA    0   24     4.9       NA
## 3     3     1      4    1   NA     5.3        6
## 4     4     1      3 <NA>   NA     4.7        5
## 5     5     1      5    1   24      NA        6
## 6     6     1     NA    0   NA     4.7        5
dim(popNCR)
## [1] 2000    7
nrow(popNCR)
## [1] 2000
ncol(popNCR)
## [1] 7
summary(popNCR)
##      pupil           class          extrav         sex           texp     
##  Min.   : 1.00   17     :  26   Min.   : 1.000   0   :661   Min.   : 2.0  
##  1st Qu.: 6.00   63     :  25   1st Qu.: 4.000   1   :843   1st Qu.: 7.0  
##  Median :11.00   10     :  24   Median : 5.000   NA's:496   Median :12.0  
##  Mean   :10.65   15     :  24   Mean   : 5.313              Mean   :11.8  
##  3rd Qu.:16.00   4      :  23   3rd Qu.: 6.000              3rd Qu.:16.0  
##  Max.   :26.00   21     :  23   Max.   :10.000              Max.   :25.0  
##                  (Other):1855   NA's   :516                 NA's   :976   
##     popular         popteach     
##  Min.   :0.000   Min.   : 1.000  
##  1st Qu.:3.900   1st Qu.: 4.000  
##  Median :4.800   Median : 5.000  
##  Mean   :4.829   Mean   : 4.834  
##  3rd Qu.:5.800   3rd Qu.: 6.000  
##  Max.   :9.100   Max.   :10.000  
##  NA's   :510     NA's   :528

The data set has 2000 rows and 7 columns (variables). The variables extrav, sex, texp, popular and popteach contain missings. About a quarter of these variables is missing, except for texp where 50 % is missing.


3. As we have seen before, the function md.pattern() is used to display all different missing data patterns. How many different missing data patterns are present in the popNCR dataframe and which patterns occur most frequently in the data? Also find out how many patterns we would observe when variable texp (teacher experience) is not considered.

md.pattern(popNCR)
##     pupil class sex popular extrav popteach texp     
## 308     1     1   1       1      1        1    1    0
## 114     1     1   1       1      0        1    1    1
## 102     1     1   0       1      1        1    1    1
## 279     1     1   1       1      1        1    0    1
## 119     1     1   1       0      1        1    1    1
## 110     1     1   1       1      1        0    1    1
##  85     1     1   0       1      0        1    1    2
##  98     1     1   1       1      0        1    0    2
##  89     1     1   0       1      1        1    0    2
##  29     1     1   1       0      0        1    1    2
##  19     1     1   0       0      1        1    1    2
## 113     1     1   1       0      1        1    0    2
##  33     1     1   1       1      0        0    1    2
##  25     1     1   0       1      1        0    1    2
## 115     1     1   1       1      1        0    0    2
##  50     1     1   1       0      1        0    1    2
##  56     1     1   0       1      0        1    0    3
##   4     1     1   0       0      0        1    1    3
##  21     1     1   1       0      0        1    0    3
##  27     1     1   0       0      1        1    0    3
##   9     1     1   0       1      0        0    1    3
##  24     1     1   1       1      0        0    0    3
##  29     1     1   0       1      1        0    0    3
##   2     1     1   1       0      0        0    1    3
##  13     1     1   0       0      1        0    1    3
##  75     1     1   1       0      1        0    0    3
##   9     1     1   0       0      0        1    0    4
##  14     1     1   0       1      0        0    0    4
##   2     1     1   0       0      0        0    1    4
##  14     1     1   1       0      0        0    0    4
##  11     1     1   0       0      1        0    0    4
##   2     1     1   0       0      0        0    0    5
##         0     0 496     510    516      528  976 3026

There are 32 unique patterns. The pattern where everything is observed and the pattern where only texp is missing occur most frequently.

If we omit texp, then the following pattern matrix is realized:

md.pattern(popNCR[ , -5])
##     pupil class sex popular extrav popteach     
## 587     1     1   1       1      1        1    0
## 212     1     1   1       1      0        1    1
## 191     1     1   0       1      1        1    1
## 232     1     1   1       0      1        1    1
## 225     1     1   1       1      1        0    1
## 141     1     1   0       1      0        1    2
##  50     1     1   1       0      0        1    2
##  46     1     1   0       0      1        1    2
##  57     1     1   1       1      0        0    2
##  54     1     1   0       1      1        0    2
## 125     1     1   1       0      1        0    2
##  13     1     1   0       0      0        1    3
##  23     1     1   0       1      0        0    3
##  16     1     1   1       0      0        0    3
##  24     1     1   0       0      1        0    3
##   4     1     1   0       0      0        0    4
##         0     0 496     510    516      528 2050

Without texp, there are only 16 patterns.


4. Let’s focus more precisely on the missing data patterns. Does the missing data of popular depend on popteach? One could for example check this by making a histogram of popteach separately for the pupils with known popularity and missing popularity.

In R the missingness indicator

is.na(popNCR$popular)
##    [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE
##   [12] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##   [23] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##   [34] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
##   [45] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##   [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
##   [67] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
##   [78] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##   [89] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [100] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [111] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE
##  [122]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [133]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [155] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [166] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [177] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
##  [188] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE
##  [199] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [210] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
##  [221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [232] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [243] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
##  [254] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
##  [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [276]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [287] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE
##  [298]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [309]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [320] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [331]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
##  [342] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
##  [353] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
##  [364]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [375]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
##  [386] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [397] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [408] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [419] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE
##  [430] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [441] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [452]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE
##  [463]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [474] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
##  [485] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [496] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [507] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
##  [518] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
##  [529] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [540] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [551] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [562] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [573]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
##  [584] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
##  [595] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
##  [606] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
##  [617] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [628]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [639]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
##  [650] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
##  [661]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE
##  [672] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [683]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [694] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
##  [705]  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [716] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
##  [727]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [738]  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE
##  [749] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [760] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
##  [771] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [782] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE
##  [793]  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
##  [804] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [815] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [826] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [837] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [848]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE
##  [859]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [870] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
##  [881] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
##  [892] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [903] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE
##  [914] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [925] FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE
##  [936]  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [947] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
##  [958] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [969] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##  [980]  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE
##  [991] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE
## [1002]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1013] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1024] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [1035] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
## [1046]  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE
## [1057] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1068] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1079]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1090] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
## [1101] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [1112] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1123] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1134] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [1145] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
## [1156] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1167] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
## [1178] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1189] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1200] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [1211]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE
## [1222] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1233]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1244] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
## [1255] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
## [1266]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE
## [1277]  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE
## [1288] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1299] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE
## [1310]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
## [1321] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
## [1332]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
## [1343]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1354]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE
## [1365] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1376]  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
## [1387] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
## [1398]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
## [1409] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1420] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE
## [1431] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
## [1442] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1453]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [1464] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
## [1475]  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE
## [1486] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE
## [1497] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE
## [1508] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## [1519] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1530] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## [1541] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
## [1552] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE
## [1563] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
## [1574]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1585] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [1596] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [1607] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1618] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1629] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1640] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE
## [1651]  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE
## [1662]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
## [1673] FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE
## [1684]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1695] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [1706]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
## [1717] FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE
## [1728] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1739] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1750] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1761] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1772]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [1783] FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE
## [1794]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE
## [1805] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1816] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [1827] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1838] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
## [1849] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1860] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1871] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1882] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [1893]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1904]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [1915]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [1926] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [1937]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [1948]  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE
## [1959] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
## [1970] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## [1981] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1992]  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE

is a dummy variable of the same length as popular with value 0 (FALSE) for observed pupil popularity and 1 (TRUE) for missing pupil popularity. A histogram can be made with the function histogram(). The code for a conditional histogram of popteach given the missingness indicator for popular is

histogram(~ popteach | is.na(popular), data=popNCR)

We do see that the histogram for the missing popular (TRUE) is further to the right than the histogram for observed popular (FALSE). This would indicate a right-tailed MAR missingness. In fact this is exactly what happens, because we created the missingness in these data ourselves. But we can make it observable by examining the relations between the missingness in popular and the observed data in popteach.


5. Does the missingness of the other incomplete variables depend on popteach? If yes, what is the direction of the relation?

histogram(~ popteach | is.na(sex), data = popNCR)  

There seems to be a left-tailed relation between popteach and the missingness in sex.

histogram(~ popteach | is.na(extrav), data = popNCR)

There also seems to be a left-tailed relation between popteach and the missingness in extrav.

histogram(~ popteach | is.na(texp), data = popNCR)

There seems to be no observable relation between popteach and the missingness in texp. It might be MCAR or even MNAR.


6. Find out if the missingness in teacher popularity depends on pupil popularity.

histogram(~ popular | is.na(popteach), data = popNCR)

Yes: there is a dependency. The relation seems to be right-tailed.


7. Have a look at the intraclass correlation (ICC) for the incomplete variables popular, popteach and texp.

icc(aov(popular ~ as.factor(class), data = popNCR))
## [1] 0.328007
icc(aov(popteach ~ class, data = popNCR))
## [1] 0.3138658
icc(aov(texp ~ class, data = popNCR))
## [1] 1

Please note that the function icc() comes from the package multilevel (function ICC1()), but is included in the workspace popular.RData. Write down the ICC’s, you’ll need them later.


7b. Do you think it is necessary to take the multilevel structure into account?

YES! There is a strong cluster structure going on. If we ignore the clustering in our imputation model, we may run into invalid inference. To stay as close to the true data model, we must take the cluster structure into account during imputation.


8. Impute the popNCR dataset with mice using imputation method norm for popular, popteach, texp and extrav. Exclude class as a predictor for all variables. Call the mids-object imp1.

ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth
##    pupil    class   extrav      sex     texp  popular popteach 
##       ""       ""    "pmm" "logreg"    "pmm"    "pmm"    "pmm"
meth[c(3, 5, 6, 7)] <- "norm"
meth
##    pupil    class   extrav      sex     texp  popular popteach 
##       ""       ""   "norm" "logreg"   "norm"   "norm"   "norm"
pred <- ini$pred
pred
##          pupil class extrav sex texp popular popteach
## pupil        0     0      0   0    0       0        0
## class        0     0      0   0    0       0        0
## extrav       1     1      0   1    1       1        1
## sex          1     1      1   0    1       1        1
## texp         1     1      1   1    0       1        1
## popular      1     1      1   1    1       0        1
## popteach     1     1      1   1    1       1        0
pred[, "class"] <- 0
pred[, "pupil"] <- 0
pred
##          pupil class extrav sex texp popular popteach
## pupil        0     0      0   0    0       0        0
## class        0     0      0   0    0       0        0
## extrav       0     0      0   1    1       1        1
## sex          0     0      1   0    1       1        1
## texp         0     0      1   1    0       1        1
## popular      0     0      1   1    1       0        1
## popteach     0     0      1   1    1       1        0
imp1 <- mice(popNCR, meth = meth, pred = pred, print = FALSE)

9. Compare the means of the variables in the first imputed dataset and in the incomplete dataset.

summary(complete(imp1))
##      pupil           class          extrav       sex           texp       
##  Min.   : 1.00   17     :  26   Min.   : 1.000   0: 984   Min.   :-6.897  
##  1st Qu.: 6.00   63     :  25   1st Qu.: 4.159   1:1016   1st Qu.: 8.000  
##  Median :11.00   10     :  24   Median : 5.000            Median :12.434  
##  Mean   :10.65   15     :  24   Mean   : 5.270            Mean   :12.595  
##  3rd Qu.:16.00   4      :  23   3rd Qu.: 6.000            3rd Qu.:16.956  
##  Max.   :26.00   21     :  23   Max.   :10.000            Max.   :36.675  
##                  (Other):1855                                             
##     popular          popteach     
##  Min.   : 0.000   Min.   : 1.000  
##  1st Qu.: 4.100   1st Qu.: 4.000  
##  Median : 5.000   Median : 5.000  
##  Mean   : 5.006   Mean   : 5.022  
##  3rd Qu.: 5.969   3rd Qu.: 6.000  
##  Max.   :10.577   Max.   :10.000  
## 
summary(popNCR)
##      pupil           class          extrav         sex           texp     
##  Min.   : 1.00   17     :  26   Min.   : 1.000   0   :661   Min.   : 2.0  
##  1st Qu.: 6.00   63     :  25   1st Qu.: 4.000   1   :843   1st Qu.: 7.0  
##  Median :11.00   10     :  24   Median : 5.000   NA's:496   Median :12.0  
##  Mean   :10.65   15     :  24   Mean   : 5.313              Mean   :11.8  
##  3rd Qu.:16.00   4      :  23   3rd Qu.: 6.000              3rd Qu.:16.0  
##  Max.   :26.00   21     :  23   Max.   :10.000              Max.   :25.0  
##                  (Other):1855   NA's   :516                 NA's   :976   
##     popular         popteach     
##  Min.   :0.000   Min.   : 1.000  
##  1st Qu.:3.900   1st Qu.: 4.000  
##  Median :4.800   Median : 5.000  
##  Mean   :4.829   Mean   : 4.834  
##  3rd Qu.:5.800   3rd Qu.: 6.000  
##  Max.   :9.100   Max.   :10.000  
##  NA's   :510     NA's   :528

9b. The missingness in texp is MNAR: higher values for texp have a larger probability to be missing. Can you see this in the imputed data? Do you think this is a problem?

Yes, we can see this in the imputed data: teacher experience increases slightly after imputation. However, texp is the same for all pupils in a class. But not all pupils have this information recorded (as if some pupils did not remember, or were not present during data collection). This is not a problem, because as long as at least one pupil in each class has teacher experience recorded, we can deductively impute the correct (i.e. true) value for every pupil in the class.


10. Compare the ICC’s of the variables in the first imputed dataset with those in the incomplete dataset (use popular, popteach and texp). Make a notation of the ICC’s after imputation.

data.frame(vars = names(popNCR[c(6, 7, 5)]), 
           observed = c(icc(aov(popular ~ class, popNCR)), 
                        icc(aov(popteach ~ class, popNCR)), 
                        icc(aov(texp ~ class, popNCR))), 
           norm     = c(icc(aov(popular ~ class, complete(imp1))), 
                        icc(aov(popteach ~ class, complete(imp1))), 
                        icc(aov(texp ~ class, complete(imp1)))))
##       vars  observed      norm
## 1  popular 0.3280070 0.2798184
## 2 popteach 0.3138658 0.2645641
## 3     texp 1.0000000 0.4639840

11. Now impute the popNCR dataset again with mice using imputation method norm for popular, popteach, texp and extrav, but now include class as a predictor for all variables. Call the mids-object imp2.

pred <- ini$pred
pred[, "pupil"] <- 0
imp2 <- mice(popNCR, meth = meth, pred = pred, print = FALSE)

12. Compare the ICC’s of the variables in the first imputed dataset from imp2 with those of imp1 and the incomplete dataset (use popular, popteach and texp). Make a notation of the ICC’s after imputation.

data.frame(vars = names(popNCR[c(6, 7, 5)]), 
           observed  = c(icc(aov(popular ~ class, popNCR)), 
                         icc(aov(popteach ~ class, popNCR)), 
                         icc(aov(texp ~ class, popNCR))), 
           norm      = c(icc(aov(popular ~ class, complete(imp1))), 
                         icc(aov(popteach ~ class, complete(imp1))), 
                         icc(aov(texp ~ class, complete(imp1)))), 
           normclass = c(icc(aov(popular ~ class, complete(imp2))), 
                         icc(aov(popteach ~ class, complete(imp2))), 
                         icc(aov(texp ~ class, complete(imp2)))))
##       vars  observed      norm normclass
## 1  popular 0.3280070 0.2798184 0.3585751
## 2 popteach 0.3138658 0.2645641 0.3362400
## 3     texp 1.0000000 0.4639840 0.9999996

By simply forcing the algorithm to use the class variable during estimation we adopt a fixed effects approach. This conforms to formulating seperate regression models for each class and imputing within classes from these models.


Checking Convergence of the imputations

13. Inspect the trace lines for the variables popular, texp and extrav.

plot(imp2, c("popular", "texp", "popteach"))