The pool()
function combines the estimates from m
repeated complete data analyses. The typical sequence of steps to
perform a multiple imputation analysis is:
Impute the missing data by the mice()
function, resulting in
a multiple imputed data set (class mids
);
Fit the model of interest (scientific model) on each imputed data set
by the with()
function, resulting an object of class mira
;
Pool the estimates from each model into a single set of estimates
and standard errors, resulting in an object of class mipo
;
Optionally, compare pooled estimates from different scientific models
by the D1()
or D3()
functions.
A common error is to reverse steps 2 and 3, i.e., to pool the
multiplyimputed data instead of the estimates. Doing so may severely bias
the estimates of scientific interest and yield incorrect statistical
intervals and pvalues. The pool()
function will detect
this case.
pool(object, dfcom = NULL, rule = NULL) pool.syn(object, dfcom = NULL, rule = "reiter2003")
object  An object of class 

dfcom  A positive number representing the degrees of freedom in the
completedata analysis. Normally, this would be the number of independent
observation minus the number of fitted parameters. The default
( 
rule  A string indicating the pooling rule. Currently supported are

An object of class mipo
, which stands for 'multiple imputation
pooled outcome'.
For rule "reiter2003"
values for lambda
and fmi
are
set to `NA`, as these statistics do not apply for data synthesised from
fully observed data.
The pool()
function averages the estimates of the complete
data model, computes the total variance over the repeated analyses
by Rubin's rules (Rubin, 1987, p. 76), and computes the following
diagnostic statistics per estimate:
Relative increase in variance due to nonresponse r
;
Residual degrees of freedom for hypothesis testing df
;
Proportion of total variance due to missingness lambda
;
Fraction of missing information fmi
.
The degrees of freedom calculation for the pooled estimates uses the BarnardRubin adjustment for small samples (Barnard and Rubin, 1999).
The pool.syn()
function combines estimates by Reiter's partially
synthetic data pooling rules (Reiter, 2003). This combination rule
assumes that the data that is synthesised is completely observed.
Pooling differs from Rubin's method in the calculation of the total
variance and the degrees of freedom.
Pooling requires the following input from each fitted model:
the estimates of the model;
the standard error of each estimate;
the residual degrees of freedom of the model.
The pool()
and pool.syn()
functions rely on the
broom::tidy
and broom::glance
for extracting these
parameters.
Since mice 3.0+
, the broom
package takes care of filtering out the relevant parts of the
completedata analysis. It may happen that you'll see the messages
like Error: No tidy method for objects of class ...
or
Error: No glance method for objects of class ...
. The message
means that your completedata method used in with(imp, ...)
has
no tidy
or glance
method defined in the broom
package.
The broom.mixed
package contains tidy
and glance
methods
for mixed models. If you are using a mixed model, first run
library(broom.mixed)
before calling pool()
.
If no tidy
or glance
methods are defined for your analysis
tabulate the m
parameter estimates and their variance
estimates (the square of the standard errors) from the m
fitted
models stored in fit$analyses
. For each parameter, run
pool.scalar
to obtain the pooled parameters estimate, its variance, the
degrees of freedom, the relative increase in variance and the fraction of missing
information.
An alternative is to write your own glance()
and tidy()
methods and add these to broom
according to the specifications
given in https://broom.tidymodels.org.
In versions prior to mice 3.0
pooling required that
coef()
and vcov()
methods were available for fitted
objects. This feature is no longer supported. The reason is that
vcov()
methods are inconsistent across packages, leading to
buggy behaviour of the pool()
function.
Since mice 3.13.2
function pool()
uses the robust
the standard error estimate for pooling when it can extract
robust.se
from the tidy()
object.
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948955.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
Reiter, J.P. (2003). Inference for Partially Synthetic, Public Use Microdata Sets. Survey Methodology, 29, 181189.
van Buuren S and GroothuisOudshoorn K (2011). mice
: Multivariate
Imputation by Chained Equations in R
. Journal of Statistical
Software, 45(3), 167. doi: 10.18637/jss.v045.i03
with.mids
, as.mira
, pool.scalar
,
glance
, tidy
https://github.com/amices/mice/issues/142,
https://github.com/amices/mice/issues/274
# impute missing data, analyse and pool using the classic MICE workflow imp < mice(nhanes, maxit = 2, m = 2) #> #> iter imp variable #> 1 1 bmi hyp chl #> 1 2 bmi hyp chl #> 2 1 bmi hyp chl #> 2 2 bmi hyp chl fit < with(data = imp, exp = lm(bmi ~ hyp + chl)) summary(pool(fit)) #> term estimate std.error statistic df p.value #> 1 (Intercept) 22.23763118 4.10202768 5.4211314 12.99275 0.0001170692 #> 2 hyp 0.69389387 2.35374015 0.2948048 19.86401 0.7712020057 #> 3 chl 0.01400427 0.01881781 0.7442031 19.28297 0.4657266422 # generate fully synthetic data, analyse and pool imp < mice(cars, maxit = 2, m = 2, where = matrix(TRUE, nrow(cars), ncol(cars))) #> #> iter imp variable #> 1 1 speed dist #> 1 2 speed dist #> 2 1 speed dist #> 2 2 speed dist fit < with(data = imp, exp = lm(speed ~ dist)) summary(pool.syn(fit)) #> term estimate std.error statistic df p.value #> 1 (Intercept) 12.61856976 1.12645701 11.201999 239.23775 0.0000000000 #> 2 dist 0.08291348 0.02341726 3.540699 47.48388 0.0009057364