Skip to contents

Imputes univariate missing normal data using lasso linear regression with bootstrap.


mice.impute.lasso.norm(y, ry, x, wy = NULL, nfolds = 10, ...)



Vector to be imputed


Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.


Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.


Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.


The number of folds for the cross-validation of the lasso penalty. The default is 10.


Other named arguments.


Vector with imputed data, same type as y, and of length sum(wy)


The method consists of the following steps:

  1. For a given y variable under imputation, draw a bootstrap version y* with replacement from the observed cases y[ry], and stores in x* the corresponding values from x[ry, ].

  2. Fit a regularised (lasso) linear regression with y* as the outcome, and x* as predictors. A vector of regression coefficients bhat is obtained. All of these coefficients are considered random draws from the imputation model parameters posterior distribution. Same of these coefficients will be shrunken to 0.

  3. Draw the imputed values from the predictive distribution defined by the original (non-bootstrap) data, bhat, and estimated error variance.

The method is based on the Direct Use of Regularized Regression (DURR) proposed by Zhao & Long (2016) and Deng et al (2016).


Deng, Y., Chang, C., Ido, M. S., & Long, Q. (2016). Multiple imputation for general missing data patterns in the presence of high-dimensional data. Scientific reports, 6(1), 1-10.

Zhao, Y., & Long, Q. (2016). Multiple imputation in the presence of high-dimensional data. Statistical Methods in Medical Research, 25(5), 2021-2035.


Edoardo Costantini, 2021