Multiple imputation pooling: univariate version

Pools univariate estimates of m repeated complete data analysis

Usage

pool.scalar(Q, U, n = Inf, k = 1, rule = c("rubin1987", "reiter2003"))

pool.scalar.syn(Q, U, n = Inf, k = 1, rule = "reiter2003")

Arguments

Q: A vector of univariate estimates of m repeated complete data analyses.
U: A vector containing the corresponding m variances of the univariate estimates.
n: A number providing the sample size. If nothing is specified, an infinite sample n = Inf is assumed.
k: A number indicating the number of parameters to be estimated. By default, k = 1 is assumed.
rule: A string indicating the pooling rule. Currently supported are "rubin1987" (default, for missing data) and "reiter2003" (for synthetic data created from a complete data set).

Value

Returns a list with components.

m:: Number of imputations.
qhat:: The m univariate estimates of repeated complete-data analyses.
u:: The corresponding m variances of the univariate estimates.
qbar:: The pooled univariate estimate, formula (3.1.2) Rubin (1987).
ubar:: The mean of the variances (i.e. the pooled within-imputation variance), formula (3.1.3) Rubin (1987).
b:: The between-imputation variance, formula (3.1.4) Rubin (1987).
t:: The total variance of the pooled estimated, formula (3.1.5) Rubin (1987).
r:: The relative increase in variance due to nonresponse, formula (3.1.7) Rubin (1987).
df:: The degrees of freedom for t reference distribution by the method of Barnard-Rubin (1999).
fmi:: The fraction missing information due to nonresponse, formula (3.1.10) Rubin (1987). (Not defined for synthetic data.)

Details

The function averages the univariate estimates of the complete data model, computes the total variance over the repeated analyses, and computes the relative increase in variance due to missing data or data synthesisation and the fraction of missing information.

References

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.

Reiter, J.P. (2003). Inference for Partially Synthetic, Public Use Microdata Sets. Survey Methodology, 29, 181-189.

Author

Karin Groothuis-Oudshoorn and Stef van Buuren, 2009; Thom Volker, 2021

Examples

# missing data imputation with with manual pooling
imp <- mice(nhanes, maxit = 2, m = 2, print = FALSE, seed = 18210)
fit <- with(data = imp, lm(bmi ~ age))

# manual pooling
summary(fit$analyses[[1]])
#> 
#> Call:
#> lm(formula = bmi ~ age)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.1587 -3.0674  0.9413  2.3870  8.7413 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  28.1043     1.8853   14.91 2.61e-13 ***
#> age          -1.5457     0.9723   -1.59    0.126    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 3.957 on 23 degrees of freedom
#> Multiple R-squared:  0.099,	Adjusted R-squared:  0.05983 
#> F-statistic: 2.527 on 1 and 23 DF,  p-value: 0.1255
#> 
summary(fit$analyses[[2]])
#> 
#> Call:
#> lm(formula = bmi ~ age)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.3611 -3.6333  0.9389  2.3389  7.5389 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   29.189      2.019  14.460 4.92e-13 ***
#> age           -1.428      1.041  -1.371    0.183    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 4.236 on 23 degrees of freedom
#> Multiple R-squared:  0.0756,	Adjusted R-squared:  0.03541 
#> F-statistic: 1.881 on 1 and 23 DF,  p-value: 0.1835
#> 
pool.scalar(Q = c(-1.5457, -1.428), U = c(0.9723^2, 1.041^2), n = 25, k = 2)
#> $m
#> [1] 2
#> 
#> $qhat
#> [1] -1.5457 -1.4280
#> 
#> $u
#> [1] 0.9453673 1.0836810
#> 
#> $qbar
#> [1] -1.48685
#> 
#> $ubar
#> [1] 1.014524
#> 
#> $b
#> [1] 0.006926645
#> 
#> $t
#> [1] 1.024914
#> 
#> $df
#> [1] 20.97025
#> 
#> $r
#> [1] 0.01024122
#> 
#> $fmi
#> [1] 0.09272831
#> 

# check: automatic pooling using broom
pool(fit)
#> Class: mipo    m = 2 
#>          term m  estimate     ubar           b        t dfcom       df
#> 1 (Intercept) 2 28.646618 3.814682 0.588114658 4.696854    23 10.72144
#> 2         age 2 -1.486715 1.014543 0.006947187 1.024964    23 20.96937
#>         riv     lambda        fmi
#> 1 0.2312570 0.18782190 0.30620278
#> 2 0.0102714 0.01016697 0.09275848

# manual pooling for synthetic data created from complete data
imp <- mice(cars,
  maxit = 2, m = 2, print = FALSE, seed = 18210,
  where = matrix(TRUE, nrow(cars), ncol(cars))
)
fit <- with(data = imp, lm(speed ~ dist))

# manual pooling: extract Q and U
summary(fit$analyses[[1]])
#> 
#> Call:
#> lm(formula = speed ~ dist)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.9740 -2.3144 -0.1494  3.1287  7.4115 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 10.15208    1.06236   9.556 1.10e-12 ***
#> dist         0.12182    0.02121   5.744 6.15e-07 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 3.618 on 48 degrees of freedom
#> Multiple R-squared:  0.4074,	Adjusted R-squared:  0.395 
#> F-statistic:    33 on 1 and 48 DF,  p-value: 6.147e-07
#> 
summary(fit$analyses[[2]])
#> 
#> Call:
#> lm(formula = speed ~ dist)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.5830 -3.1680 -0.3479  3.3928  8.1902 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  9.46952    1.31136   7.221 3.37e-09 ***
#> dist         0.13209    0.02516   5.250 3.43e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 4.271 on 48 degrees of freedom
#> Multiple R-squared:  0.3647,	Adjusted R-squared:  0.3515 
#> F-statistic: 27.56 on 1 and 48 DF,  p-value: 3.428e-06
#> 
pool.scalar.syn(Q = c(0.12182, 0.13209), U = c(0.02121^2, 0.02516^2), n = 50, k = 2)
#> $m
#> [1] 2
#> 
#> $qhat
#> [1] 0.12182 0.13209
#> 
#> $u
#> [1] 0.0004498641 0.0006330256
#> 
#> $qbar
#> [1] 0.126955
#> 
#> $ubar
#> [1] 0.0005414448
#> 
#> $b
#> [1] 5.273645e-05
#> 
#> $t
#> [1] 0.0005678131
#> 
#> $df
#> [1] 463.7127
#> 
#> $r
#> [1] 0.1460992
#> 
#> $fmi
#> [1] NA
#> 

# check: automatic pooling using broom
pool.syn(fit)
#> Class: mipo    m = 2 
#>          term m  estimate         ubar            b            t dfcom       df
#> 1 (Intercept) 2 9.8108000 1.4241330840 2.329428e-01 1.5406044600    48 174.9621
#> 2        dist 2 0.1269552 0.0005414288 5.273011e-05 0.0005677938    48 463.7928
#>         riv lambda fmi
#> 1 0.2453522     NA  NA
#> 2 0.1460860     NA  NA