Predict method for linear models with multiply imputed data
Usage
predict_mi(
object,
newdata,
pool = TRUE,
se.fit = FALSE,
interval = c("none", "confidence", "prediction"),
level = 0.95,
...
)Arguments
- object
A prediction model, either a single lm object or a list of lm objects obtained from multiply imputed data (object can also be of class mira).
- newdata
An optional data frame in which to look for variables with which to predict. Can be a data.frame, list, or mids object. If omitted, the fitted values are used.
- pool
Logical indicating whether to pool the predictions (and potentially obtain pooled prediction intervals).
- se.fit
A switch indicating if standard errors are required.
- interval
Type of interval calculation. Can be abbreviated.
- level
Tolerance/confidence level.
- ...
Arguments passed on to
stats::predict.lmscaleScale parameter for std.err. calculation.
dfDegrees of freedom for scale.
typeType of prediction (response or model term). Can be abbreviated.
termsIf
type = "terms", which terms (default is all terms), acharactervector.na.actionfunction determining what should be done with missing values in
newdata. The default is to predictNA.pred.varthe variance(s) for future observations to be assumed for prediction intervals. See ‘Details’.
weightsvariance weights for prediction. This can be a numeric vector or a one-sided model formula. In the latter case, it is interpreted as an expression evaluated in
newdata.rankdeficienta
characterstring specifying what should happen in the case of a rank deficient model, i.e., whenobject$rank < ncol(model.matrix(object))."warnif":gives a
warningonly in case of predicting ‘non-estimable’ cases, i.e., vectors not in the same predictor subspace as the original data (with tolerancetol). In that case, the non-estimable indices are also returned as attribute"non-estim"(seerankdeficient="non-estim")."simple":is back compatible to R < 4.3.0, possibly giving dubious predictions in non-estimable cases, and always signalling a
warning."non-estim":gives the same predictions without
warning, and with an attributeattr(*, "non-estim")with indices in1:nrow(newdata)of new data observations which are deemed non-estimable."NA":predicts
NAfor non-estimable new data, silently. Often recommended in new code."NAwarn":predicts
NAfor non-estimable new data with awarning.
tolnon-negative number determining how non-estimability is determined in rank deficient cases.
verboselogicalindicating if messages should be produced about rank deficiency handling.
Value
If pool = TRUE, predict_mi produces a vector of predictions or a
matrix of predictions and bounds with column names fit, lwr, upr if
additionally interval is set. For type = "terms" this is a matrix with a
column per term and may have an attribute "constant". If se.fit = TRUE, a
list is returned with the following components:
fit: vector or matrix as abovese.fit: standard error of pooled predicted meansresidual scale: average residual standard deviationsdf: degrees of freedom for residual (per observation) according to Barnard- Rubin (1999).
If pool = FALSE, the function produces a list with predictions with the same
structure as predict.lm, with one list-element per imputed dataset.
Examples
# Dataframe with missings in X
# Create Imp and Lm object
dat <- mice::nhanes
# add indicator training and test
# first 20 for training, last 5 for testing
dat$set <- c(rep("train", 20), rep("test", 5))
# Make prediction matrix and ensure that set is not used as a predictor
predmat <- mice::make.predictorMatrix(dat)
predmat[,"set"] <- 0
# Impute missing values based on the train set
imp <- mice(dat, m = 5, maxit = 5 , seed = 1, predictorMatrix = predmat,
ignore = ifelse(dat$set == "test", TRUE, FALSE), print = FALSE)
impdats <- complete(imp, "all")
# extract the training and test data sets
traindats <- lapply(impdats, function(dat) subset(dat, set == "train", select = -set))
testdats <- lapply(impdats, function(dat) subset(dat, set == "test", select = -c(set)))
# Fit the prediction models, based on the imputed training data
fits <- lapply(traindats, function(dat) lm(age ~ bmi + hyp + chl, data = dat))
# pool the predictions with function
pool_preds <- mice::predict_mi(object = fits, newdata = testdats,
pool = TRUE, interval = "prediction", level = 0.95)
