Skip to contents

Predict method for linear models with multiply imputed data

Usage

predict_mi(
  object,
  newdata,
  pool = TRUE,
  se.fit = FALSE,
  interval = c("none", "confidence", "prediction"),
  level = 0.95,
  ...
)

Arguments

object

A prediction model, either a single lm object or a list of lm objects obtained from multiply imputed data (object can also be of class mira).

newdata

An optional data frame in which to look for variables with which to predict. Can be a data.frame, list, or mids object. If omitted, the fitted values are used.

pool

Logical indicating whether to pool the predictions (and potentially obtain pooled prediction intervals).

se.fit

A switch indicating if standard errors are required.

interval

Type of interval calculation. Can be abbreviated.

level

Tolerance/confidence level.

...

Arguments passed on to stats::predict.lm

scale

Scale parameter for std.err. calculation.

df

Degrees of freedom for scale.

type

Type of prediction (response or model term). Can be abbreviated.

terms

If type = "terms", which terms (default is all terms), a character vector.

na.action

function determining what should be done with missing values in newdata. The default is to predict NA.

pred.var

the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’.

weights

variance weights for prediction. This can be a numeric vector or a one-sided model formula. In the latter case, it is interpreted as an expression evaluated in newdata.

rankdeficient

a character string specifying what should happen in the case of a rank deficient model, i.e., when object$rank < ncol(model.matrix(object)).

"warnif":

gives a warning only in case of predicting ‘non-estimable’ cases, i.e., vectors not in the same predictor subspace as the original data (with tolerance tol). In that case, the non-estimable indices are also returned as attribute "non-estim" (see rankdeficient="non-estim").

"simple":

is back compatible to R < 4.3.0, possibly giving dubious predictions in non-estimable cases, and always signalling a warning.

"non-estim":

gives the same predictions without warning, and with an attribute attr(*, "non-estim") with indices in 1:nrow(newdata) of new data observations which are deemed non-estimable.

"NA":

predicts NA for non-estimable new data, silently. Often recommended in new code.

"NAwarn":

predicts NA for non-estimable new data with a warning.

tol

non-negative number determining how non-estimability is determined in rank deficient cases.

verbose

logical indicating if messages should be produced about rank deficiency handling.

Value

If pool = TRUE, predict_mi produces a vector of predictions or a matrix of predictions and bounds with column names fit, lwr, upr if additionally interval is set. For type = "terms" this is a matrix with a column per term and may have an attribute "constant". If se.fit = TRUE, a list is returned with the following components:

  • fit: vector or matrix as above

  • se.fit: standard error of pooled predicted means

  • residual scale: average residual standard deviations

  • df: degrees of freedom for residual (per observation) according to Barnard- Rubin (1999).

If pool = FALSE, the function produces a list with predictions with the same structure as predict.lm, with one list-element per imputed dataset.

Examples

# Dataframe with missings in X
# Create Imp and Lm object
dat <- mice::nhanes

# add indicator training and test
# first 20 for training, last 5 for testing
dat$set <- c(rep("train", 20), rep("test", 5))

# Make prediction matrix and ensure that set is not used as a predictor
predmat <- mice::make.predictorMatrix(dat)
predmat[,"set"] <- 0

# Impute missing values based on the train set
imp <- mice(dat, m = 5, maxit = 5 , seed = 1, predictorMatrix = predmat, 
  ignore = ifelse(dat$set == "test", TRUE, FALSE), print = FALSE)
impdats <- complete(imp, "all")

# extract the training and test data sets
traindats <- lapply(impdats, function(dat) subset(dat, set == "train", select = -set))
testdats <- lapply(impdats, function(dat) subset(dat, set == "test", select = -c(set)))

# Fit the prediction models, based on the imputed training data
fits <- lapply(traindats, function(dat) lm(age ~ bmi + hyp + chl, data = dat))

# pool the predictions with function
pool_preds <- mice::predict_mi(object = fits, newdata = testdats, 
  pool = TRUE, interval = "prediction", level = 0.95)