Missing data

The Centre for Multilevel Modelling has a long-standing interest in developing methods and software to aid researchers in handling missing data.

As discussed below, we have developed functionality in two chief areas of our software: REALCOM-IMPUTE and in Stat-JR.  For a general introduction to the potential implications of having incomplete datasets, and approaches to analysing them, see http://missingdata.lshtm.ac.uk/, and also Module 14 of the free LEMMA multilevel modelling online course.

Imputation for Multilevel Models with Missing Data Using REALCOM-IMPUTE

As outlined in the dedicated REALCOM-IMPUTE webpage, REALCOM-IMPUTE was developed as part of the ESRC grant Developing Multilevel Models for Realistically Complex Social Science Data. First released in 2009, it allows Normal, unordered categorical and ordered categorical variables to be added as responses (i.e. with missing data) in an imputation model which can properly handle a 2-level structure (i.e. the user can request missing values be imputed for variables solely varying at level 2).

Written in MATLAB, REALCOM-IMPUTE generates imputed datasets which the user can then post-process via MLwiN (see the manual REALCOM-IMPUTE: multiple imputation using MLwiN for further details) or via realcomImpute in Stata.

Imputation for Multilevel Models with Missing Data Using Stat-JR

A number of Stat-JR templates have been recently developed to aid researchers in handling missing data in datasets with multilevel structures. These are typically much quicker than the equivalent executions in REALCOM-IMPUTE, and allow for greater flexibility too. Please note, though, that these templates have not been as widely-tested as REALCOM-IMPUTE.

As the summary table below indicates, there are three principal Stat-JR templates which support handling missing data in multilevel generalised linear models. The first two use ‘multiple imputation’ which is a widely used procedure that will handle a large number of models: a 2-level (2LevelImpute) and N-level (NLevelImpute) version have been developed. The second (one pass) approach is a more recent generalisation with a more robust theoretical justification (see Goldstein et al, 2014 for further details); this has been implemented in the 2LevelMissingOnePass template.

Stat-JR Template2LevelImputeNLevelImpute2LevelMissingOnePass
Methodology Multiple imputation (joint modelling approach) Multiple imputation (joint modelling approach) Fully Bayesian 'one pass' procedure
Multilevel structure Up to 2 levels Up to N levels (nested and/or cross-classified) Up to 2 levels
Model of interest (MOI) response types Normal, binary, Poisson, multivariate Normal* Normal, binary, Poisson, multivariate mixed (Normal, binary, ordered, unordered; at any level / classification)* Normal, binomial, Poisson, negative binomial†
Imputation model response types Normal, binary, ordered, unordered Normal, binary, ordered, unordered Normal, binary
Handles polynomial / interaction terms in MOI? No No Yes
Allows for random slopes / coefficients in MOI? Yes, but for univariate response models only Yes, but for univariate response models only Yes
* Poisson or binomial (cf. binary) can only be used if no missing data in these variables; see the ‘Imputation model response types’ row in the table for variable types for which missing data is allowed.
† In the case of the 2LevelMissingOnePass template, if there are any missing values in the MOI response variable then that case will be automatically dropped.

Our template 2LevelMissingOnePass is a fully Bayesian procedure that requires the specification of the model of interest and the model for imputing missing values, and produces a standard MCMC chain that can be used for inferences. It allows for Normal, binomial, Poisson and negative binomial responses, and for missing values in Normal and binary predictors at levels 1 or 2. It allows for categorical predictors, but these must have no missing values. Future releases will overcome this limitation. It has two main advantages. The first is that it is a fully Bayesian procedure with a sound theoretical foundation. The second is that it properly allows for the fitting of interaction and polynomial terms in the model of interest.

The N-level multiple imputation template (NLevelImpute) will handle Normal, binary and Poisson as univariate response types in the model of interest (MOI), and also multivariate mixed responses (Normal, binary, ordered, unordered) at any level in the MOI. The predictors can be Normal, binary, ordered or unordered. Our 2-level multiple imputation template (2LevelImpute) has been available since 2014 and therefore has been more widely tested, but unlike NLevelImpute it does not handle mixed responses at level 2 in the MOI.

Note that to use these templates, you will need to first order and install Stat-JR, and then download the zipped file below:

Imputation for Multilevel Models with Missing Data Using Stat-JR (zip, 500 kb)

In addition, a pdf is available providing a brief introduction to the templates.

The zip file contains the three principal Stat-JR templates described in the table above, and a number of sub-templates which some of these call (the pdf describes these dependencies, some of which are core templates released as part of the standard Stat-JR install).

Once you have downloaded the zip file, extract it into your Stat-JR directory so that the files are saved as follows:


StatJR\ebooks\2LevelImpute eBook.zip


Once you have saved these files, you can open TREE (see the relevant Stat-JR manuals - e.g. the TREE Quick-start Guide or the Beginner's Guide to Stat-JR's TREE interface - for further guidance), and choose your principal template of interest from the list of templates.

If you encounter any bugs then please let us know via our Bug Report Form. Note also a Stat-JR forum is available.