Skip to contents

Imputes univariate missing binary data using lasso logistic regression with bootstrap.

Usage

mice.impute.lasso.logreg(y, ry, x, wy = NULL, nfolds = 10, ...)

Arguments

y

Vector to be imputed

ry

Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

x

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

nfolds

The number of folds for the cross-validation of the lasso penalty. The default is 10.

...

Other named arguments.

Value

Vector with imputed data, same type as y, and of length sum(wy)

Details

The method consists of the following steps:

  1. For a given y variable under imputation, draw a bootstrap version y* with replacement from the observed cases y[ry], and stores in x* the corresponding values from x[ry, ].

  2. Fit a regularised (lasso) logistic regression with y* as the outcome, and x* as predictors. A vector of regression coefficients bhat is obtained. All of these coefficients are considered random draws from the imputation model parameters posterior distribution. Same of these coefficients will be shrunken to 0.

  3. Compute predicted scores for m.d., i.e. logit-1(X bhat)

  4. Compare the score to a random (0,1) deviate, and impute.

The method is based on the Direct Use of Regularized Regression (DURR) proposed by Zhao & Long (2016) and Deng et al (2016).

References

Deng, Y., Chang, C., Ido, M. S., & Long, Q. (2016). Multiple imputation for general missing data patterns in the presence of high-dimensional data. Scientific reports, 6(1), 1-10.

Zhao, Y., & Long, Q. (2016). Multiple imputation in the presence of high-dimensional data. Statistical Methods in Medical Research, 25(5), 2021-2035.

Author

Edoardo Costantini, 2021