Imputation under MNAR mechanism by NARFCS

Imputes univariate data under a user-specified MNAR mechanism by linear or logistic regression and NARFCS. Sensitivity analysis under different model specifications may shed light on the impact of different MNAR assumptions on the conclusions.

Usage

mice.impute.mnar.logreg(y, ry, x, wy = NULL, ums = NULL, umx = NULL, ...)

mice.impute.mnar.norm(y, ry, x, wy = NULL, ums = NULL, umx = NULL, ...)

Arguments

y: Vector to be imputed
ry: Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.
x: Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.
wy: Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.
ums: A string containing the specification of the unidentifiable part of the imputation model (the *unidentifiable model specification"), that is, the desired \(\delta\)-adjustment (offset) as a function of other variables and values for the corresponding deltas (sensitivity parameters). See details.
umx: An auxiliary data matrix containing variables that do not appear in the identifiable part of the imputation procedure but that have been specified via ums as being predictors in the unidentifiable part of the imputation model. See details.
...: Other named arguments.

Value

Vector with imputed data, same type as y, and of length sum(wy)

Details

This function imputes data that are thought to be Missing Not at Random (MNAR) by the NARFCS method. The NARFCS procedure (Tompsett et al, 2018) generalises the so-called \(\delta\)-adjustment sensitivity analysis method of Van Buuren, Boshuizen & Knook (1999) to the case with multiple incomplete variables within the FCS framework. In practical terms, the NARFCS procedure shifts the imputations drawn at each iteration of mice by a user-specified quantity that can vary across subjects, to reflect systematic departures of the missing data from the data distribution imputed under MAR.

Specification of the NARFCS model is done by the blots argument of mice(). The blots parameter is a named list. For each variable to be imputed by mice.impute.mnar.norm() or mice.impute.mnar.logreg() the corresponding element in blots is a list with at least one argument ums and, optionally, a second argument umx. For example, the high-level call might like something like mice(nhanes[, c(2, 4)], method = c("pmm", "mnar.norm"), blots = list(chl = list(ums = "-3+2*bmi"))).

The ums parameter is required, and might look like this: "-4+1*Y". The ums specifcation must have the following characteristics:

A single term corresponding to the intercept (constant) term, not multiplied by any variable name, must be included in the expression;
Each term in the expression (corresponding to the intercept or a predictor variable) must be separated by either a "+" or "-" sign, depending on the sign of the sensitivity parameter;
Within each non-intercept term, the sensitivity parameter value comes first and the predictor variable comes second, and these must be separated by a "*" sign;
For categorical predictors, for example a variable Z with K + 1 categories ("Cat0","Cat1", ...,"CatK"), K category-specific terms are needed, and those not in umx (see below) must be specified by concatenating the variable name with the name of the category (e.g. ZCat1) as this is how they are named in the design matrix (argument x) passed to the univariate imputation function. An example is "2+1*ZCat1-3*ZCat2".

If given, the umx specification must have the following characteristics:

It contains only complete variables, with no missing values;
It is a numeric matrix. In particular, categorical variables must be represented as dummy indicators with names corresponding to what is used in ums to refer to the category-specific terms (see above);
It has the same number of rows as the data argument passed on to the main mice function;
It does not contain variables that were already predictors in the identifiable part of the model for the variable under imputation.

Limitation: The present implementation can only condition on variables that appear in the identifiable part of the imputation model (x) or in complete auxiliary variables passed on via the umx argument. It is not possible to specify models where the offset depends on incomplete auxiliary variables.

For an MNAR alternative see also mice.impute.ri.

References

Tompsett, D. M., Leacy, F., Moreno-Betancur, M., Heron, J., & White, I. R. (2018). On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Statistics in Medicine, 37(15), 2338-2353. doi:10.1002/sim.7643 .

Van Buuren, S., Boshuizen, H.C., Knook, D.L. (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681–694.

Author

Margarita Moreno-Betancur, Stef van Buuren, Ian R. White, 2020.

Examples

# 1: Example with no auxiliary data: only pass unidentifiable model specification (ums)

# Specify argument to pass on to mnar imputation functions via "blots" argument
mnar.blot <- list(X = list(ums = "-4"), Y = list(ums = "2+1*ZCat1-3*ZCat2"))

# Run NARFCS by using mnar imputation methods and passing argument via blots
impNARFCS <- mice(mnar_demo_data,
  method = c("mnar.logreg", "mnar.norm", ""),
  blots = mnar.blot, seed = 234235, print = FALSE
)

# Obtain MI results: Note they coincide with those from old version at
# https://github.com/moreno-betancur/NARFCS
pool(with(impNARFCS, lm(Y ~ X + Z)))$pooled$estimate
#> [1]  19.368813   3.039045 -14.643202 -28.586061

# 2: Example passing also auxiliary data to MNAR procedure (umx)
# Assumptions:
# - Auxiliary data are complete, no missing values
# - Auxiliary data are a numeric matrix
# - Auxiliary data have same number of rows as x
# - Auxiliary data have no overlapping variable names with x

# Specify argument to pass on to mnar imputation functions via "blots" argument
aux <- matrix(0:1, nrow = nrow(mnar_demo_data))
dimnames(aux) <- list(NULL, "even")
mnar.blot <- list(
  X = list(ums = "-4"),
  Y = list(ums = "2+1*ZCat1-3*ZCat2+0.5*even", umx = aux)
)

# Run NARFCS by using mnar imputation methods and passing argument via blots
impNARFCS <- mice(mnar_demo_data,
  method = c("mnar.logreg", "mnar.norm", ""),
  blots = mnar.blot, seed = 234235, print = FALSE
)

# Obtain MI results: As expected they differ (slightly) from those
# from old version at https://github.com/moreno-betancur/NARFCS
pool(with(impNARFCS, lm(Y ~ X + Z)))$pooled$estimate
#> [1]  19.521134   2.952546 -14.729454 -28.699292