Combines estimates from a tidy table
Arguments
- w
A
data.frame
with parameter estimates in tidy format (see details).- type
A string, either
"minimal"
,"tests"
or"all"
. Use minimal to mimick the output ofsummary(pool(fit))
. The default is"all"
.- conf.int
Logical indicating whether to include a confidence interval.
- conf.level
Confidence level of the interval, used only if
conf.int = TRUE
. Number between 0 and 1.- exponentiate
Flag indicating whether to exponentiate the coefficient estimates and confidence intervals (typical for logistic regression).
- dfcom
A positive number representing the degrees of freedom of the residuals in the complete-data analysis. The
dfcom
argument is used for the Barnard-Rubin adjustment. In a linear regression,dfcom
would be equivalent to the number of independent observation minus the number of fitted parameters, but the expression becomes more complex for regularized, proportional hazards, or other semi-parametric techniques. Only used ifw
lacks a column named"df.residual"
.- custom.t
A custom character string to be parsed as a calculation rule for the total variance
t
. The custom rule can use the other calculated pooling statistics. The defaultt
calculation has the form".data$ubar + (1 + 1 / .data$m) * .data$b"
.- rule
A string indicating the pooling rule. Currently supported are
"rubin1987"
(default, for analyses applied to multiply-imputed incomplete data) and"reiter2003"
(for analyses applied to synthetic data created from complete data).- ...
Arguments passed down
Value
pool.table()
returns a data.frame
with aggregated
estimates, standard errors, confidence intervals and statistical tests.
The meaning of the columns is as follows:
term | Parameter name |
m | Number of multiple imputations |
estimate | Pooled complete data estimate |
std.error | Standard error of estimate |
statistic | t-statistic = estimate / std.error |
df | Degrees of freedom for statistic |
p.value | One-sided P-value under null hypothesis |
conf.low | Lower bound of c.i. (default 95 pct) |
conf.high | Upper bound of c.i. (default 95 pct) |
riv | Relative increase in variance |
fmi | Fraction of missing information |
ubar | Within-imputation variance of estimate |
b | Between-imputation variance of estimate |
t | Total variance, of estimate |
dfcom | Residual degrees of freedom in complete data |
Details
The input data w
is a data.frame
with columns named:
term | a character or factor with the parameter names |
estimate | a numeric vector with parameter estimates |
std.error | a numeric vector with standard errors of estimate |
residual.df | a numeric vector with the degrees of freedom |
Columns 1-3 are obligatory. Column 4 is optional. Usually,
all entries in column 4 are the same. The user can omit column 4,
and specify argument pool.table(..., dfcom = ...)
instead.
If both are given, then column residual.df
takes precedence.
If neither are specified, then mice
tries to calculate the
residual degrees of freedom. If that fails (e.g. because there is
no information on sample size), mice
sets dfcom = Inf
.
The value dfcom = Inf
is acceptable for large samples
(n > 1000) and relatively concise parametric models.
Examples
# conventional mice workflow
imp <- mice(nhanes2, m = 2, maxit = 2, seed = 1, print = FALSE)
fit <- with(imp, lm(chl ~ age + bmi + hyp))
pld1 <- pool(fit)
pld1$pooled
#> term m estimate ubar b t dfcom df
#> 1 (Intercept) 2 2.979488 3081.702712 16.77783124 3106.869459 20 18.09145
#> 2 age40-59 2 52.005346 367.726421 13.68301760 388.250947 20 16.49810
#> 3 age60-99 2 70.077449 498.129498 112.28149168 666.551735 20 7.29272
#> 4 bmi 2 6.006762 3.897692 0.02197335 3.930652 20 18.08472
#> 5 hypyes 2 -4.347543 408.567912 6.75735741 418.703948 20 17.63466
#> riv lambda fmi
#> 1 0.008166507 0.008100355 0.1021574
#> 2 0.055814663 0.052864073 0.1500157
#> 3 0.338109344 0.252676917 0.3978908
#> 4 0.008456294 0.008385384 0.1024454
#> 5 0.024808694 0.024208122 0.1187861
# using pool.table() on tidy table
tbl <- summary(fit)[, c("term", "estimate", "std.error", "df.residual")]
tbl
#> # A tibble: 10 × 4
#> term estimate std.error df.residual
#> <chr> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.0831 58.1 20
#> 2 age40-59 49.4 19.8 20
#> 3 age60-99 62.6 22.7 20
#> 4 bmi 5.90 2.07 20
#> 5 hypyes -2.51 22.1 20
#> 6 (Intercept) 5.88 52.8 20
#> 7 age40-59 54.6 18.6 20
#> 8 age60-99 77.6 21.9 20
#> 9 bmi 6.11 1.87 20
#> 10 hypyes -6.19 18.1 20
pld2 <- pool.table(tbl, type = "minimal")
pld2
#> term m estimate ubar b t dfcom df
#> 1 (Intercept) 2 2.979488 3081.702712 16.77783124 3106.869459 20 18.09145
#> 2 age40-59 2 52.005346 367.726421 13.68301760 388.250947 20 16.49810
#> 3 age60-99 2 70.077449 498.129498 112.28149168 666.551735 20 7.29272
#> 4 bmi 2 6.006762 3.897692 0.02197335 3.930652 20 18.08472
#> 5 hypyes 2 -4.347543 408.567912 6.75735741 418.703948 20 17.63466
#> riv lambda fmi
#> 1 0.008166507 0.008100355 0.1021574
#> 2 0.055814663 0.052864073 0.1500157
#> 3 0.338109344 0.252676917 0.3978908
#> 4 0.008456294 0.008385384 0.1024454
#> 5 0.024808694 0.024208122 0.1187861
identical(pld1$pooled, pld2)
#> [1] TRUE
# conventional workflow: all numerical output
all1 <- summary(pld1, type = "all", conf.int = TRUE)
all1
#> term m estimate std.error statistic df p.value
#> 1 (Intercept) 2 2.979488 55.739299 0.05345398 18.09145 0.957956041
#> 2 age40-59 2 52.005346 19.704085 2.63931807 16.49810 0.017526719
#> 3 age60-99 2 70.077449 25.817663 2.71432191 7.29272 0.028863238
#> 4 bmi 2 6.006762 1.982587 3.02975940 18.08472 0.007175381
#> 5 hypyes 2 -4.347543 20.462257 -0.21246647 17.63466 0.834179684
#> 2.5 % 97.5 % conf.low conf.high riv lambda fmi
#> 1 -114.082019 120.04099 -114.082019 120.04099 0.008166507 0.008100355 0.1021574
#> 2 10.336814 93.67388 10.336814 93.67388 0.055814663 0.052864073 0.1500157
#> 3 9.521628 130.63327 9.521628 130.63327 0.338109344 0.252676917 0.3978908
#> 4 1.842899 10.17062 1.842899 10.17062 0.008456294 0.008385384 0.1024454
#> 5 -47.401078 38.70599 -47.401078 38.70599 0.024808694 0.024208122 0.1187861
#> ubar b t dfcom
#> 1 3081.702712 16.77783124 3106.869459 20
#> 2 367.726421 13.68301760 388.250947 20
#> 3 498.129498 112.28149168 666.551735 20
#> 4 3.897692 0.02197335 3.930652 20
#> 5 408.567912 6.75735741 418.703948 20
# pool.table workflow: all numerical output
all2 <- pool.table(tbl)
all2
#> term m estimate std.error statistic df p.value
#> 1 (Intercept) 2 2.979488 55.739299 0.05345398 18.09145 0.957956041
#> 2 age40-59 2 52.005346 19.704085 2.63931807 16.49810 0.017526719
#> 3 age60-99 2 70.077449 25.817663 2.71432191 7.29272 0.028863238
#> 4 bmi 2 6.006762 1.982587 3.02975940 18.08472 0.007175381
#> 5 hypyes 2 -4.347543 20.462257 -0.21246647 17.63466 0.834179684
#> 2.5 % 97.5 % conf.low conf.high riv lambda fmi
#> 1 -114.082019 120.04099 -114.082019 120.04099 0.008166507 0.008100355 0.1021574
#> 2 10.336814 93.67388 10.336814 93.67388 0.055814663 0.052864073 0.1500157
#> 3 9.521628 130.63327 9.521628 130.63327 0.338109344 0.252676917 0.3978908
#> 4 1.842899 10.17062 1.842899 10.17062 0.008456294 0.008385384 0.1024454
#> 5 -47.401078 38.70599 -47.401078 38.70599 0.024808694 0.024208122 0.1187861
#> ubar b t dfcom
#> 1 3081.702712 16.77783124 3106.869459 20
#> 2 367.726421 13.68301760 388.250947 20
#> 3 498.129498 112.28149168 666.551735 20
#> 4 3.897692 0.02197335 3.930652 20
#> 5 408.567912 6.75735741 418.703948 20
class(all1) <- "data.frame"
identical(all1, all2)
#> [1] TRUE