Imputation by predictive mean matching with distance aided donor selection
Source:R/mice.impute.midastouch.R
mice.impute.midastouch.Rd
Imputes univariate missing data using predictive mean matching.
Usage
mice.impute.midastouch(
y,
ry,
x,
wy = NULL,
ridge = 1e-05,
midas.kappa = NULL,
outout = TRUE,
neff = NULL,
debug = NULL,
...
)
Arguments
- y
Vector to be imputed
- ry
Logical vector of length
length(y)
indicating the the subsety[ry]
of elements iny
to which the imputation model is fitted. Thery
generally distinguishes the observed (TRUE
) and missing values (FALSE
) iny
.- x
Numeric design matrix with
length(y)
rows with predictors fory
. Matrixx
may have no missing values.- wy
Logical vector of length
length(y)
. ATRUE
value indicates locations iny
for which imputations are created.- ridge
The ridge penalty used in
.norm.draw()
to prevent problems with multicollinearity. The default isridge = 1e-05
, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), setridge = 1e-06
or even lower to reduce bias. For highly collinear data, setridge = 1e-04
or higher.- midas.kappa
Scalar. If
NULL
(default) then the optimalkappa
gets selected automatically. Alternatively, the user may specify a scalar. Siddique and Belin 2008 findmidas.kappa = 3
to be sensible.- outout
Logical. If
TRUE
(default) one model is estimated for each donor (leave-one-out principle). For speedup chooseoutout = FALSE
, which estimates one model for all observations leading to in-sample predictions for the donors and out-of-sample predictions for the recipients. Mind the inappropriateness, though.- neff
FOR EXPERTS. Null or character string. The name of an existing environment in which the effective sample size of the donors for each loop (CE iterations times multiple imputations) is supposed to be written. The effective sample size is necessary to compute the correction for the total variance as originally suggested by Parzen, Lipsitz and Fitzmaurice 2005. The objectname is
midastouch.neff
.- debug
FOR EXPERTS. Null or character string. The name of an existing environment in which the input is supposed to be written. The objectname is
midastouch.inputlist
.- ...
Other named arguments.
Details
Imputation of y
by predictive mean matching, based on
Rubin (1987, p. 168, formulas a and b) and Siddique and Belin 2008.
The procedure is as follows:
Draw a bootstrap sample from the donor pool.
Estimate a beta matrix on the bootstrap sample by the leave one out principle.
Compute type II predicted values for
yobs
(nobs x 1) andymis
(nmis x nobs).Calculate the distance between all
yobs
and the correspondingymis
.Convert the distances in drawing probabilities.
For each recipient draw a donor from the entire pool while considering the probabilities from the model.
Take its observed value in
y
as the imputation.
References
Gaffert, P., Meinfelder, F., Bosch V. (2018) Towards an MI-proper Predictive Mean Matching, JSM 2018. Discussion Paper.
Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287–301.
Parzen, M., Lipsitz, S. R., Fitzmaurice, G. M. (2005), A note on reducing the bias of the approximate Bayesian bootstrap imputation variance estimator. Biometrika 92, 4, 971–974.
Rubin, D.B. (1987), Multiple imputation for nonresponse in surveys. New York: Wiley.
Siddique, J., Belin, T.R. (2008), Multiple imputation using an iterative hot-deck with distance-based donor selection. Statistics in medicine, 27, 1, 83–102
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006), Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 12, 1049–1064.
Van Buuren, S., Groothuis-Oudshoorn, K. (2011), mice
: Multivariate
Imputation by Chained Equations in R
. Journal of
Statistical Software, 45, 3, 1–67. doi:10.18637/jss.v045.i03
See also
Other univariate imputation functions:
mice.impute.cart()
,
mice.impute.lasso.logreg()
,
mice.impute.lasso.norm()
,
mice.impute.lasso.select.logreg()
,
mice.impute.lasso.select.norm()
,
mice.impute.lda()
,
mice.impute.logreg()
,
mice.impute.logreg.boot()
,
mice.impute.mean()
,
mice.impute.mnar.logreg()
,
mice.impute.mpmm()
,
mice.impute.norm()
,
mice.impute.norm.boot()
,
mice.impute.norm.nob()
,
mice.impute.norm.predict()
,
mice.impute.pmm()
,
mice.impute.polr()
,
mice.impute.polyreg()
,
mice.impute.quadratic()
,
mice.impute.rf()
,
mice.impute.ri()
Examples
# do default multiple imputation on a numeric matrix
imp <- mice(nhanes, method = "midastouch")
#>
#> iter imp variable
#> 1 1 bmi hyp chl
#> 1 2 bmi hyp chl
#> 1 3 bmi hyp chl
#> 1 4 bmi hyp chl
#> 1 5 bmi hyp chl
#> 2 1 bmi hyp chl
#> 2 2 bmi hyp chl
#> 2 3 bmi hyp chl
#> 2 4 bmi hyp chl
#> 2 5 bmi hyp chl
#> 3 1 bmi hyp chl
#> 3 2 bmi hyp chl
#> 3 3 bmi hyp chl
#> 3 4 bmi hyp chl
#> 3 5 bmi hyp chl
#> 4 1 bmi hyp chl
#> 4 2 bmi hyp chl
#> 4 3 bmi hyp chl
#> 4 4 bmi hyp chl
#> 4 5 bmi hyp chl
#> 5 1 bmi hyp chl
#> 5 2 bmi hyp chl
#> 5 3 bmi hyp chl
#> 5 4 bmi hyp chl
#> 5 5 bmi hyp chl
imp
#> Class: mids
#> Number of multiple imputations: 5
#> Imputation methods:
#> age bmi hyp chl
#> "" "midastouch" "midastouch" "midastouch"
#> PredictorMatrix:
#> age bmi hyp chl
#> age 0 1 1 1
#> bmi 1 0 1 1
#> hyp 1 1 0 1
#> chl 1 1 1 0
# list the actual imputations for BMI
imp$imp$bmi
#> 1 2 3 4 5
#> 1 30.1 35.3 22.5 22.5 26.3
#> 3 30.1 30.1 30.1 29.6 22.0
#> 4 21.7 27.2 21.7 20.4 25.5
#> 6 21.7 27.4 25.5 25.5 25.5
#> 10 27.2 22.7 22.7 24.9 27.4
#> 11 30.1 29.6 30.1 22.5 22.0
#> 12 35.3 26.3 25.5 24.9 27.4
#> 16 30.1 29.6 30.1 22.5 22.0
#> 21 30.1 22.5 30.1 22.5 22.0
# first completed data matrix
complete(imp)
#> age bmi hyp chl
#> 1 1 30.1 1 187
#> 2 2 22.7 1 187
#> 3 1 30.1 1 187
#> 4 3 21.7 2 186
#> 5 1 20.4 1 113
#> 6 3 21.7 1 184
#> 7 1 22.5 1 118
#> 8 1 30.1 1 187
#> 9 2 22.0 1 238
#> 10 2 27.2 1 186
#> 11 1 30.1 1 187
#> 12 2 35.3 1 229
#> 13 3 21.7 1 206
#> 14 2 28.7 2 204
#> 15 1 29.6 1 187
#> 16 1 30.1 1 187
#> 17 3 27.2 2 284
#> 18 2 26.3 2 199
#> 19 1 35.3 1 218
#> 20 3 25.5 2 206
#> 21 1 30.1 1 187
#> 22 1 33.2 1 229
#> 23 1 27.5 1 131
#> 24 3 24.9 1 284
#> 25 2 27.4 1 186
# imputation on mixed data with a different method per column
mice(nhanes2, method = c("sample", "midastouch", "logreg", "norm"))
#>
#> iter imp variable
#> 1 1 bmi hyp chl
#> 1 2 bmi hyp chl
#> 1 3 bmi hyp chl
#> 1 4 bmi hyp chl
#> 1 5 bmi hyp chl
#> 2 1 bmi hyp chl
#> 2 2 bmi hyp chl
#> 2 3 bmi hyp chl
#> 2 4 bmi hyp chl
#> 2 5 bmi hyp chl
#> 3 1 bmi hyp chl
#> 3 2 bmi hyp chl
#> 3 3 bmi hyp chl
#> 3 4 bmi hyp chl
#> 3 5 bmi hyp chl
#> 4 1 bmi hyp chl
#> 4 2 bmi hyp chl
#> 4 3 bmi hyp chl
#> 4 4 bmi hyp chl
#> 4 5 bmi hyp chl
#> 5 1 bmi hyp chl
#> 5 2 bmi hyp chl
#> 5 3 bmi hyp chl
#> 5 4 bmi hyp chl
#> 5 5 bmi hyp chl
#> Class: mids
#> Number of multiple imputations: 5
#> Imputation methods:
#> age bmi hyp chl
#> "" "midastouch" "logreg" "norm"
#> PredictorMatrix:
#> age bmi hyp chl
#> age 0 1 1 1
#> bmi 1 0 1 1
#> hyp 1 1 0 1
#> chl 1 1 1 0