Skip to contents

Imputes the "best value" according to the linear regression model, also known as regression imputation.

Usage

mice.impute.norm.predict(y, ry, x, wy = NULL, ...)

Arguments

y

Vector to be imputed

ry

Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

x

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

...

Other named arguments.

Value

Vector with imputed data, same type as y, and of length sum(wy)

Details

Calculates regression weights from the observed data and returns predicted values to as imputations. This method is known as regression imputation.

Warning

THIS METHOD SHOULD NOT BE USED FOR DATA ANALYSIS. This method is seductive because it imputes the most likely value according to the model. However, it ignores the uncertainty of the missing values and artificially amplifies the relations between the columns of the data. Application of richer models having more parameters does not help to evade these issues. Stochastic regression methods, like mice.impute.pmm or mice.impute.norm, are generally preferred.

At best, prediction can give reasonable estimates of the mean, especially if normality assumptions are plausible. See Little and Rubin (2002, p. 62-64) or Van Buuren (2012, p. 11-13, p. 45-46) for a discussion of this method.

References

Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data. New York: John Wiley and Sons.

Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

Author

Gerko Vink, Stef van Buuren, 2018