JSM 2016

Disclaimer

This presentation is about work in progress and is made solely by myself.

Images are bluntly borrowed from other sources; mostly without specific consent but with the conjugate prior that the authors would not care at all.

Some of the information and ideas come from joint contemplation and/or work with Stef van Buuren, Andrew Gelman, Jeroen Pannekoek, Shahab Jolani and Laura Boeschoten.

The following institutions allow me to cultivate my creativity
HTML5 Icon

Popularity of multiple imputation

HTML5 Icon
The number of publications that have ‘multiple imputation’ as a topic (Web of Science\(^{tm}\) Citation Reports, retrieved on August 1 2016)

How good is new methodology

The quality of a solution obtained by multiple imputation depends on

  1. the statistical properties of the incomplete data
  2. the inference problem with respect to the assumptions
  3. the degree to which an imputation procedure is able to capture these properties when modeling missing values on the given data.

How is new methodology evaluated

When evaluating the statistical properties (and thereby the practical applicability) of new imputation methodology, researchers most often make use of simulation studies.

  1. a complete data set is usually generated from a statistical model
  2. another model is used to induce missingness
  3. a set of evaluation criteria is postulated to evaluate the performance of the imputation method.

However, no golden standard has been established

  • as a result, validity of the simulation and evaluation procedures may differ tremendously from one developer to another.

Potential problems

This lack of consensus brings forth a chain of potential problems in the objective assessment of the performance of imputation routines that may lead to suboptimal use of multiple imputation in practice.

  1. problems with data generation
  2. problems with missingness generation
  3. problems with performance evaluation.

Data generation

  1. Data are often generated following the model (or distribution) that is also used for imputing the data.
    • Evaluated conditions are in favor of the problem that is studied.
    • Although such procedures may be statistically relevant, the approach would be no good for even the simplest case.
  2. Data are often generated such that the problem that is being studied is most pronounced.
    • This often results in simulated data that contains very valuable information structures
    • i.e. the correlations between groups of variables may be very pronounced.

Missingness generation

  1. The missing data generation procedure is insufficiently described
  2. The induced MAR mechanism is spurious
    HTML5 Icon
  3. MCAR missingness is often not considered
  4. Different percentages are used

Performance evaluation

  1. The focus often lies on accuracy (how well can the method reproduce the original data)
  2. The performance of imputation procedures on distributional properties is often ignored in simulation studies
    • even though the estimates on the analysis level may be justified, some methods can yield imputations that may seem completely invalid to applied researchers.
  3. It is not necessary to take sampling variation into account (e.g. Vink and Van Buuren, 2014).
  4. Qualitative evaluation of the performance is often ignored.
    • A method may perform badly, but if it still outperforms every other approach, it may yet be of great practical relevance.

Some more thougths

Imputation techniques are developed to solve distinct problems.

  • They are evaluated on their performance on these problems, but are potentially of great scientific use outside of their target application.
  • Such innovative applications remain unknown.

Also, the target audience for multiple imputation consists of applied researchers from all scientific domains.

  • These researchers often lack the statistical knowledge to understand the methodology behind these imputation techniques.
  • How can these researchers decide what imputation technique would be suitable for their problem?

Towards standardization

Benchmarking and assessment

Integrate benchmarking into multiple imputation methodology

  • a standardized set of benchmark routines for assessing the quality of multiple imputation methods.

If we evaluate imputation techniques on a standardized set we solve two distinct problems:

  1. A standardized set allows for a fair and thorough evaluation of the statistical properties of every imputation technique.
  2. When each routine is put through a standardized evaluation procedure, we can make fair and justified comparisons between imputation techniques.

This helps two camps

Applied researchers

When it is possible to compare benchmarked techniques to one another, we can determine what would be the optimal imputation method for specific data problems.

  • applied researchers can get the optimal imputation technology for their data, without the advanced statistical knowledge that is normally required when implementing multiple imputation techniques.

Developers

Benchmarking brings gives an indication of the state-of-the-art

HTML5 Icon

Things left to do

  • Further explore the common ground between experts on what to call standard in evaluating multiple imputation procedures
  • Make BAMI online accessible
  • Let online BAMI produce code to run offline
  • Collect more data sets
  • Implement into Stan

If you have thoughts on this: please e-mail me

Proposed standardized route

  1. Obtain truth
    • model-based or design-based
  2. Create valid missingness
    • i.e. cf. the method described by Brand (1999)
    • 10, 25 and 50 percent missingness
  3. Simulate
    • leave out sampling variation
  4. Evaluate imputations
    • bias, coverage, ciw
    • distributional characteristics vs observed plausibility
    • algorithmic convergence

References

  • Brand, J. (1999). Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets. PhD thesis, Erasmus University Rotterdam.

  • Vink, G. and van Buuren, S. (2014). Pooling multiple imputations when the sample happens to be the population. arXiv preprint arXiv:1409.8542.

Cooking vs multiple imputation.

  • study your data set
  • find methodology
  • ignore standard procedures
  • use experience and skills
  • eliminate dumb luck
  • adjust the inference when needed
  • use plausible imputations