Predict method for linear models with multiply imputed data
Usage
predict_mi(
object,
newdata,
pool = TRUE,
se.fit = FALSE,
interval = c("none", "confidence", "prediction"),
level = 0.95,
...
)
Arguments
- object
A prediction model, either a single lm object or a list of lm objects obtained from multiply imputed data (object can also be of class mira).
- newdata
An optional data frame in which to look for variables with which to predict. Can be a data.frame, list, or mids object. If omitted, the fitted values are used.
- pool
Logical indicating whether to pool the predictions (and potentially obtain pooled prediction intervals).
- se.fit
A switch indicating if standard errors are required.
- interval
Type of interval calculation. Can be abbreviated.
- level
Tolerance/confidence level.
- ...
Arguments passed on to
stats::predict.lm
scale
Scale parameter for std.err. calculation.
df
Degrees of freedom for scale.
type
Type of prediction (response or model term). Can be abbreviated.
terms
If
type = "terms"
, which terms (default is all terms), acharacter
vector.na.action
function determining what should be done with missing values in
newdata
. The default is to predictNA
.pred.var
the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’.
weights
variance weights for prediction. This can be a numeric vector or a one-sided model formula. In the latter case, it is interpreted as an expression evaluated in
newdata
.rankdeficient
a
character
string specifying what should happen in the case of a rank deficient model, i.e., whenobject$rank < ncol(model.matrix(object))
."warnif"
:gives a
warning
only in case of predicting ‘non-estimable’ cases, i.e., vectors not in the same predictor subspace as the original data (with tolerancetol
). In that case, the non-estimable indices are also returned as attribute"non-estim"
(seerankdeficient="non-estim"
)."simple"
:is back compatible to R < 4.3.0, possibly giving dubious predictions in non-estimable cases, and always signalling a
warning
."non-estim"
:gives the same predictions without
warning
, and with an attributeattr(*, "non-estim")
with indices in1:nrow(newdata)
of new data observations which are deemed non-estimable."NA"
:predicts
NA
for non-estimable new data, silently. Often recommended in new code."NAwarn"
:predicts
NA
for non-estimable new data with awarning
.
tol
non-negative number determining how non-estimability is determined in rank deficient cases.
verbose
logical
indicating if messages should be produced about rank deficiency handling.
Value
If pool = TRUE
, predict_mi produces a vector of predictions or a
matrix of predictions and bounds with column names fit
, lwr
, upr
if
additionally interval
is set. For type = "terms"
this is a matrix with a
column per term and may have an attribute "constant"
. If se.fit = TRUE
, a
list is returned with the following components:
fit
: vector or matrix as abovese.fit
: standard error of pooled predicted meansresidual scale
: average residual standard deviationsdf
: degrees of freedom for residual (per observation) according to Barnard- Rubin (1999).
If pool = FALSE
, the function produces a list with predictions with the same
structure as predict.lm
, with one list-element per imputed dataset.
Examples
# Dataframe with missings in X
# Create Imp and Lm object
dat <- mice::nhanes
# add indicator training and test
# first 20 for training, last 5 for testing
dat$set <- c(rep("train", 20), rep("test", 5))
# Make prediction matrix and ensure that set is not used as a predictor
predmat <- mice::make.predictorMatrix(dat)
predmat[,"set"] <- 0
# Impute missing values based on the train set
imp <- mice(dat, m = 5, maxit = 5 , seed = 1, predictorMatrix = predmat,
ignore = ifelse(dat$set == "test", TRUE, FALSE), print = FALSE)
impdats <- complete(imp, "all")
# extract the training and test data sets
traindats <- lapply(impdats, function(dat) subset(dat, set == "train", select = -set))
testdats <- lapply(impdats, function(dat) subset(dat, set == "test", select = -c(set)))
# Fit the prediction models, based on the imputed training data
fits <- lapply(traindats, function(dat) lm(age ~ bmi + hyp + chl, data = dat))
# pool the predictions with function
pool_preds <- mice::predict_mi(object = fits, newdata = testdats,
pool = TRUE, interval = "prediction", level = 0.95)