Combines estimates from a tidy table

Usage

pool.table(
  w,
  type = c("all", "minimal", "tests"),
  conf.int = TRUE,
  conf.level = 0.95,
  exponentiate = FALSE,
  dfcom = Inf,
  custom.t = NULL,
  rule = c("rubin1987", "reiter2003"),
  ...
)

Arguments

w: A data.frame with parameter estimates in tidy format (see details).
type: A string, either "minimal", "tests" or "all". Use minimal to mimick the output of summary(pool(fit)). The default is "all".
conf.int: Logical indicating whether to include a confidence interval.
conf.level: Confidence level of the interval, used only if conf.int = TRUE. Number between 0 and 1.
exponentiate: Flag indicating whether to exponentiate the coefficient estimates and confidence intervals (typical for logistic regression).
dfcom: A positive number representing the degrees of freedom of the residuals in the complete-data analysis. The dfcom argument is used for the Barnard-Rubin adjustment. In a linear regression, dfcom would be equivalent to the number of independent observation minus the number of fitted parameters, but the expression becomes more complex for regularized, proportional hazards, or other semi-parametric techniques. Only used if w lacks a column named "df.residual".
custom.t: A custom character string to be parsed as a calculation rule for the total variance t. The custom rule can use the other calculated pooling statistics. The default t calculation has the form ".data$ubar + (1 + 1 / .data$m) * .data$b".
rule: A string indicating the pooling rule. Currently supported are "rubin1987" (default, for analyses applied to multiply-imputed incomplete data) and "reiter2003" (for analyses applied to synthetic data created from complete data).
...: Arguments passed down

Value

pool.table() returns a data.frame with aggregated estimates, standard errors, confidence intervals and statistical tests.

The meaning of the columns is as follows:

`term`	Parameter name
`m`	Number of multiple imputations
`estimate`	Pooled complete data estimate
`std.error`	Standard error of `estimate`
`statistic`	t-statistic = `estimate` / `std.error`
`df`	Degrees of freedom for `statistic`
`p.value`	One-sided P-value under null hypothesis
`conf.low`	Lower bound of c.i. (default 95 pct)
`conf.high`	Upper bound of c.i. (default 95 pct)
`riv`	Relative increase in variance
`fmi`	Fraction of missing information
`ubar`	Within-imputation variance of `estimate`
`b`	Between-imputation variance of `estimate`
`t`	Total variance, of `estimate`
`dfcom`	Residual degrees of freedom in complete data

Details

The input data w is a data.frame with columns named:

`term`	a character or factor with the parameter names
`estimate`	a numeric vector with parameter estimates
`std.error`	a numeric vector with standard errors of `estimate`
`residual.df`	a numeric vector with the degrees of freedom

Columns 1-3 are obligatory. Column 4 is optional. Usually, all entries in column 4 are the same. The user can omit column 4, and specify argument pool.table(..., dfcom = ...) instead. If both are given, then column residual.df takes precedence. If neither are specified, then mice tries to calculate the residual degrees of freedom. If that fails (e.g. because there is no information on sample size), mice sets dfcom = Inf. The value dfcom = Inf is acceptable for large samples (n > 1000) and relatively concise parametric models.

Examples

# conventional mice workflow
imp <- mice(nhanes2, m = 2, maxit = 2, seed = 1, print = FALSE)
fit <- with(imp, lm(chl ~ age + bmi + hyp))
pld1 <- pool(fit)
pld1$pooled
#>          term m  estimate        ubar            b           t dfcom       df
#> 1 (Intercept) 2  2.979488 3081.702712  16.77783124 3106.869459    20 18.09145
#> 2    age40-59 2 52.005346  367.726421  13.68301760  388.250947    20 16.49810
#> 3    age60-99 2 70.077449  498.129498 112.28149168  666.551735    20  7.29272
#> 4         bmi 2  6.006762    3.897692   0.02197335    3.930652    20 18.08472
#> 5      hypyes 2 -4.347543  408.567912   6.75735741  418.703948    20 17.63466
#>           riv      lambda       fmi
#> 1 0.008166507 0.008100355 0.1021574
#> 2 0.055814663 0.052864073 0.1500157
#> 3 0.338109344 0.252676917 0.3978908
#> 4 0.008456294 0.008385384 0.1024454
#> 5 0.024808694 0.024208122 0.1187861

# using pool.table() on tidy table
tbl <- summary(fit)[, c("term", "estimate", "std.error", "df.residual")]
tbl
#> # A tibble: 10 × 4
#>    term        estimate std.error df.residual
#>    <chr>          <dbl>     <dbl>       <dbl>
#>  1 (Intercept)   0.0831     58.1           20
#>  2 age40-59     49.4        19.8           20
#>  3 age60-99     62.6        22.7           20
#>  4 bmi           5.90        2.07          20
#>  5 hypyes       -2.51       22.1           20
#>  6 (Intercept)   5.88       52.8           20
#>  7 age40-59     54.6        18.6           20
#>  8 age60-99     77.6        21.9           20
#>  9 bmi           6.11        1.87          20
#> 10 hypyes       -6.19       18.1           20
pld2 <- pool.table(tbl, type = "minimal")
pld2
#>          term m  estimate        ubar            b           t dfcom       df
#> 1 (Intercept) 2  2.979488 3081.702712  16.77783124 3106.869459    20 18.09145
#> 2    age40-59 2 52.005346  367.726421  13.68301760  388.250947    20 16.49810
#> 3    age60-99 2 70.077449  498.129498 112.28149168  666.551735    20  7.29272
#> 4         bmi 2  6.006762    3.897692   0.02197335    3.930652    20 18.08472
#> 5      hypyes 2 -4.347543  408.567912   6.75735741  418.703948    20 17.63466
#>           riv      lambda       fmi
#> 1 0.008166507 0.008100355 0.1021574
#> 2 0.055814663 0.052864073 0.1500157
#> 3 0.338109344 0.252676917 0.3978908
#> 4 0.008456294 0.008385384 0.1024454
#> 5 0.024808694 0.024208122 0.1187861

identical(pld1$pooled, pld2)
#> [1] TRUE

# conventional workflow: all numerical output
all1 <- summary(pld1, type = "all", conf.int = TRUE)
all1
#>          term m  estimate std.error   statistic       df     p.value
#> 1 (Intercept) 2  2.979488 55.739299  0.05345398 18.09145 0.957956041
#> 2    age40-59 2 52.005346 19.704085  2.63931807 16.49810 0.017526719
#> 3    age60-99 2 70.077449 25.817663  2.71432191  7.29272 0.028863238
#> 4         bmi 2  6.006762  1.982587  3.02975940 18.08472 0.007175381
#> 5      hypyes 2 -4.347543 20.462257 -0.21246647 17.63466 0.834179684
#>         2.5 %    97.5 %    conf.low conf.high         riv      lambda       fmi
#> 1 -114.082019 120.04099 -114.082019 120.04099 0.008166507 0.008100355 0.1021574
#> 2   10.336814  93.67388   10.336814  93.67388 0.055814663 0.052864073 0.1500157
#> 3    9.521628 130.63327    9.521628 130.63327 0.338109344 0.252676917 0.3978908
#> 4    1.842899  10.17062    1.842899  10.17062 0.008456294 0.008385384 0.1024454
#> 5  -47.401078  38.70599  -47.401078  38.70599 0.024808694 0.024208122 0.1187861
#>          ubar            b           t dfcom
#> 1 3081.702712  16.77783124 3106.869459    20
#> 2  367.726421  13.68301760  388.250947    20
#> 3  498.129498 112.28149168  666.551735    20
#> 4    3.897692   0.02197335    3.930652    20
#> 5  408.567912   6.75735741  418.703948    20

# pool.table workflow: all numerical output
all2 <- pool.table(tbl)
all2
#>          term m  estimate std.error   statistic       df     p.value
#> 1 (Intercept) 2  2.979488 55.739299  0.05345398 18.09145 0.957956041
#> 2    age40-59 2 52.005346 19.704085  2.63931807 16.49810 0.017526719
#> 3    age60-99 2 70.077449 25.817663  2.71432191  7.29272 0.028863238
#> 4         bmi 2  6.006762  1.982587  3.02975940 18.08472 0.007175381
#> 5      hypyes 2 -4.347543 20.462257 -0.21246647 17.63466 0.834179684
#>         2.5 %    97.5 %    conf.low conf.high         riv      lambda       fmi
#> 1 -114.082019 120.04099 -114.082019 120.04099 0.008166507 0.008100355 0.1021574
#> 2   10.336814  93.67388   10.336814  93.67388 0.055814663 0.052864073 0.1500157
#> 3    9.521628 130.63327    9.521628 130.63327 0.338109344 0.252676917 0.3978908
#> 4    1.842899  10.17062    1.842899  10.17062 0.008456294 0.008385384 0.1024454
#> 5  -47.401078  38.70599  -47.401078  38.70599 0.024808694 0.024208122 0.1187861
#>          ubar            b           t dfcom
#> 1 3081.702712  16.77783124 3106.869459    20
#> 2  367.726421  13.68301760  388.250947    20
#> 3  498.129498 112.28149168  666.551735    20
#> 4    3.897692   0.02197335    3.930652    20
#> 5  408.567912   6.75735741  418.703948    20

class(all1) <- "data.frame"
identical(all1, all2)
#> [1] TRUE