Skip to contents

Missing data exploration

Functions to count and explore the structure of the missing data.

md.pattern()
Missing data pattern
md.pairs()
Missing data pattern by variable pairs
cc()
Select complete cases
cci()
Complete case indicator
ic()
Select incomplete cases
ici()
Incomplete case indicator
mcar()
Jamshidian and Jalal's Non-Parametric MCAR Test
ncc()
Number of complete cases
nic()
Number of incomplete cases
nimp()
Number of imputations per block
fico()
Fraction of incomplete cases among cases with observed
flux()
Influx and outflux of multivariate missing data patterns
fluxplot()
Fluxplot of the missing data pattern

Main imputation functions

The workflow of multiple imputation is: multiply-impute the data, apply the complete-data model to each imputed data set, and pool the results to get to the final inference. The main functions for imputing the data are:

mice()
mice: Multivariate Imputation by Chained Equations
mice.mids()
Multivariate Imputation by Chained Equations (Iteration Step)
parlmice()
Wrapper function that runs MICE in parallel
futuremice()
Wrapper function that runs MICE in parallel

Elementary imputation functions

The elementary imputation function is the workhorse that creates the actual imputations. Elementary functions are called through the method argument of mice function. Each function imputes one or more columns in the data. There are also mice.impute.xxx functions outside the mice package.

mice.impute.2l.bin()
Imputation by a two-level logistic model using glmer
mice.impute.2l.lmer()
Imputation by a two-level normal model using lmer
mice.impute.2l.norm()
Imputation by a two-level normal model
mice.impute.2l.pan()
Imputation by a two-level normal model using pan
mice.impute.2lonly.mean()
Imputation of most likely value within the class
mice.impute.2lonly.norm()
Imputation at level 2 by Bayesian linear regression
mice.impute.2lonly.pmm()
Imputation at level 2 by predictive mean matching
mice.impute.cart()
Imputation by classification and regression trees
mice.impute.jomoImpute()
Multivariate multilevel imputation using jomo
mice.impute.lasso.logreg()
Imputation by direct use of lasso logistic regression
mice.impute.lasso.norm()
Imputation by direct use of lasso linear regression
mice.impute.lasso.select.logreg()
Imputation by indirect use of lasso logistic regression
mice.impute.lasso.select.norm()
Imputation by indirect use of lasso linear regression
mice.impute.lda()
Imputation by linear discriminant analysis
mice.impute.logreg()
Imputation by logistic regression
mice.impute.logreg.boot()
Imputation by logistic regression using the bootstrap
mice.impute.mean()
Imputation by the mean
mice.impute.midastouch()
Imputation by predictive mean matching with distance aided donor selection
mice.impute.mnar.logreg() mice.impute.mnar.norm()
Imputation under MNAR mechanism by NARFCS
mice.impute.mpmm()
Imputation by multivariate predictive mean matching
mice.impute.norm()
Imputation by Bayesian linear regression
mice.impute.norm.boot()
Imputation by linear regression, bootstrap method
mice.impute.norm.nob()
Imputation by linear regression without parameter uncertainty
mice.impute.norm.predict()
Imputation by linear regression through prediction
mice.impute.panImpute()
Impute multilevel missing data using pan
mice.impute.passive()
Passive imputation
mice.impute.pmm()
Imputation by predictive mean matching
mice.impute.polr()
Imputation of ordered data by polytomous regression
mice.impute.polyreg()
Imputation of unordered data by polytomous regression
mice.impute.quadratic()
Imputation of quadratic terms
mice.impute.rf()
Imputation by random forests
mice.impute.ri()
Imputation by the random indicator method for nonignorable data
mice.impute.sample()
Imputation by simple random sampling

Imputation model helpers

Specification of the imputation models can be made more convenient using the following set of helpers.

quickpred()
Quick selection of predictors from the data
squeeze()
Squeeze the imputed values to be within specified boundaries.
make.blocks()
Creates a blocks argument
make.blots()
Creates a blots argument
make.formulas()
Creates a formulas argument
make.method()
Creates a method argument
make.post()
Creates a post argument
make.predictorMatrix()
Creates a predictorMatrix argument
make.visitSequence()
Creates a visitSequence argument
make.where()
Creates a where argument
construct.blocks()
Construct blocks from formulas and predictorMatrix
name.blocks()
Name imputation blocks
name.formulas()
Name formula list elements

Plots comparing observed to imputed/amputed data

These plots contrast the observed data with the imputed/amputed data, usually with a blue/red distinction.

bwplot(<mids>)
Box-and-whisker plot of observed and imputed data
densityplot(<mids>)
Density plot of observed and imputed data
mids() plot(<mids>) print(<mids>) summary(<mids>)
Multiply imputed data set (mids)
stripplot(<mids>)
Stripplot of observed and imputed data
xyplot(<mids>)
Scatterplot of observed and imputed data

Repeated analyses and combining analytic estimates

Multiple imputation creates m > 1 completed data sets, fits the model of interest to each of these, and combines the analytic estimates. The following functions assist in executing the analysis and pooling steps:

with(<mids>)
Evaluate an expression in multiple imputed datasets
pool() pool.syn()
Combine estimates by pooling rules
pool.r.squared()
Pools R^2 of m models fitted to multiply-imputed data
pool.scalar() pool.scalar.syn()
Multiple imputation pooling: univariate version
pool.table()
Combines estimates from a tidy table
nelsonaalen()
Cumulative hazard rate or Nelson-Aalen estimator
pool.compare()
Compare two nested models fitted to imputed data
anova(<mira>)
Compare several nested models
fix.coef()
Fix coefficients and update model
D1()
Compare two nested models using D1-statistic
D2()
Compare two nested models using D2-statistic
D3()
Compare two nested models using D3-statistic

Data manipulation

The multiply-imputed data can be combined in various ways, and exported into other formats.

complete(<mids>)
Extracts the completed data from a mids object
cbind() rbind()
Combine R objects by rows and columns
ibind()
Enlarge number of imputations by combining mids objects
as.mids()
Converts an imputed dataset (long format) into a mids object
as.mira()
Create a mira object from repeated analyses
as.mitml.result()
Converts into a mitml.result object
filter(<mids>)
Subset rows of a mids object
mids2mplus()
Export mids object to Mplus
mids2spss()
Export mids object to SPSS

Class descriptions

The data created at the various analytic phases are stored as list objects of a specific class. The most important classes and class-test functions are:

mids() plot(<mids>) print(<mids>) summary(<mids>)
Multiply imputed data set (mids)
mira()
Create an object of class "mira"
mipo() summary(<mipo>) print(<mipo>) print(<mipo.summary>) process_mipo()
mipo: Multiple imputation pooled object
is.mids()
Check for mids object
is.mipo()
Check for mipo object
is.mira()
Check for mira object
is.mitml.result()
Check for mitml.result object

Extraction functions

Helpers to extract and print information from objects of specific classes.

convergence()
Computes convergence diagnostics for a mids object
getfit()
Extract list of fitted models
getqbar()
Extract estimate from mipo object
glance(<mipo>)
Glance method to extract information from a `mipo` object
mids() plot(<mids>) print(<mids>) summary(<mids>)
Multiply imputed data set (mids)
print(<mira>) print(<mice.anova>) print(<mice.anova.summary>)
Print a mira object
summary(<mira>) summary(<mice.anova>)
Summary of a mira object
tidy(<mipo>)
Tidy method to extract results from a `mipo` object

Low-level imputation functions

Several functions are dedicated to common low-level operations to generate the imputations:

estimice()
Computes least squares parameters
norm.draw() .norm.draw()
Draws values of beta and sigma by Bayesian linear regression
.pmm.match()
Finds an imputed value from matches in the predictive metric (deprecated)

Multivariate amputation

Amputation is the inverse of imputation, starting with a complete dataset, and creating missing data pattern according to the posited missing data mechanism. Amputation is useful for simulation studies.

ampute()
Generate missing data for simulation purposes
bwplot(<mads>)
Box-and-whisker plot of amputed and non-amputed data
xyplot(<mads>)
Scatterplot of amputed and non-amputed data against weighted sum scores
is.mads()
Check for mads object
mads() print(<mads>) summary(<mads>)
Multivariate amputed data set (mads)

Datasets

Built-in datasets

boys
Growth of Dutch boys
brandsma
Brandsma school data used Snijders and Bosker (2012)
employee
Employee selection data
fdd fdd.pred
SE Fireworks disaster data
fdgs
Fifth Dutch growth study 2009
leiden85
Leiden 85+ study
mammalsleep sleep
Mammal sleep data
mnar_demo_data
MNAR demo data
nhanes
NHANES example - all variables numerical
nhanes2
NHANES example - mixed numerical and discrete variables
pattern pattern1 pattern2 pattern3 pattern4
Datasets with various missing data patterns
popmis
Hox pupil popularity data with missing popularity scores
pops pops.pred
Project on preterm and small for gestational age infants (POPS)
potthoffroy
Potthoff-Roy data
selfreport mgg
Self-reported and measured BMI
tbc tbc.target terneuzen
Terneuzen birth cohort
toenail
Toenail data
toenail2
Toenail data
walking
Walking disability data
windspeed
Subset of Irish wind speed data

Miscellaneous functions

Miscellaneous functions

appendbreak()
Appends specified break to the data
extractBS()
Extract broken stick estimates from a lmer object
glm.mids()
Generalized linear model for mids object
lm.mids()
Linear regression for mids object
matchindex()
Find index of matched donor units
mdc()
Graphical parameter for missing data plots
mice.theme()
Set the theme for the plotting Trellis functions
supports.transparent()
Supports semi-transparent foreground colors?
version()
Echoes the package version number