Imputation under MNAR mechanism by NARFCS
Source:R/mice.impute.mnar.logreg.R
, R/mice.impute.mnar.norm.R
mice.impute.mnar.Rd
Imputes univariate data under a user-specified MNAR mechanism by linear or logistic regression and NARFCS. Sensitivity analysis under different model specifications may shed light on the impact of different MNAR assumptions on the conclusions.
Usage
mice.impute.mnar.logreg(y, ry, x, wy = NULL, ums = NULL, umx = NULL, ...)
mice.impute.mnar.norm(y, ry, x, wy = NULL, ums = NULL, umx = NULL, ...)
Arguments
- y
Vector to be imputed
- ry
Logical vector of length
length(y)
indicating the the subsety[ry]
of elements iny
to which the imputation model is fitted. Thery
generally distinguishes the observed (TRUE
) and missing values (FALSE
) iny
.- x
Numeric design matrix with
length(y)
rows with predictors fory
. Matrixx
may have no missing values.- wy
Logical vector of length
length(y)
. ATRUE
value indicates locations iny
for which imputations are created.- ums
A string containing the specification of the unidentifiable part of the imputation model (the *unidentifiable model specification"), that is, the desired \(\delta\)-adjustment (offset) as a function of other variables and values for the corresponding deltas (sensitivity parameters). See details.
- umx
An auxiliary data matrix containing variables that do not appear in the identifiable part of the imputation procedure but that have been specified via
ums
as being predictors in the unidentifiable part of the imputation model. See details.- ...
Other named arguments.
Details
This function imputes data that are thought to be Missing Not at
Random (MNAR) by the NARFCS method. The NARFCS procedure
(Tompsett et al, 2018) generalises the so-called
\(\delta\)-adjustment sensitivity analysis method of Van Buuren,
Boshuizen & Knook (1999) to the case with multiple incomplete
variables within the FCS framework. In practical terms, the
NARFCS procedure shifts the imputations drawn at each
iteration of mice
by a user-specified quantity that can
vary across subjects, to reflect systematic departures of the
missing data from the data distribution imputed under MAR.
Specification of the NARFCS model is done by the blots
argument of mice()
. The blots
parameter is a named
list. For each variable to be imputed by
mice.impute.mnar.norm()
or mice.impute.mnar.logreg()
the corresponding element in blots
is a list with
at least one argument ums
and, optionally, a second
argument umx
.
For example, the high-level call might like something like
mice(nhanes[, c(2, 4)], method = c("pmm", "mnar.norm"),
blots = list(chl = list(ums = "-3+2*bmi")))
.
The ums
parameter is required, and might look like this:
"-4+1*Y"
. The ums
specifcation must have the
following characteristics:
A single term corresponding to the intercept (constant) term, not multiplied by any variable name, must be included in the expression;
Each term in the expression (corresponding to the intercept or a predictor variable) must be separated by either a
"+"
or"-"
sign, depending on the sign of the sensitivity parameter;Within each non-intercept term, the sensitivity parameter value comes first and the predictor variable comes second, and these must be separated by a
"*"
sign;For categorical predictors, for example a variable
Z
with K + 1 categories("Cat0","Cat1", ...,"CatK")
, K category-specific terms are needed, and those not inumx
(see below) must be specified by concatenating the variable name with the name of the category (e.g.ZCat1
) as this is how they are named in the design matrix (argumentx
) passed to the univariate imputation function. An example is"2+1*ZCat1-3*ZCat2"
.
If given, the umx
specification must have the following
characteristics:
It contains only complete variables, with no missing values;
It is a numeric matrix. In particular, categorical variables must be represented as dummy indicators with names corresponding to what is used in
ums
to refer to the category-specific terms (see above);It has the same number of rows as the
data
argument passed on to the mainmice
function;It does not contain variables that were already predictors in the identifiable part of the model for the variable under imputation.
Limitation: The present implementation can only condition on variables
that appear in the identifiable part of the imputation model (x
) or
in complete auxiliary variables passed on via the umx
argument.
It is not possible to specify models where the offset depends on
incomplete auxiliary variables.
For an MNAR alternative see also mice.impute.ri
.
References
Tompsett, D. M., Leacy, F., Moreno-Betancur, M., Heron, J., & White, I. R. (2018). On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Statistics in Medicine, 37(15), 2338-2353. doi:10.1002/sim.7643 .
Van Buuren, S., Boshuizen, H.C., Knook, D.L. (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681–694.
See also
Other univariate imputation functions:
mice.impute.cart()
,
mice.impute.lasso.logreg()
,
mice.impute.lasso.norm()
,
mice.impute.lasso.select.logreg()
,
mice.impute.lasso.select.norm()
,
mice.impute.lda()
,
mice.impute.logreg()
,
mice.impute.logreg.boot()
,
mice.impute.mean()
,
mice.impute.midastouch()
,
mice.impute.mpmm()
,
mice.impute.norm()
,
mice.impute.norm.boot()
,
mice.impute.norm.nob()
,
mice.impute.norm.predict()
,
mice.impute.pmm()
,
mice.impute.polr()
,
mice.impute.polyreg()
,
mice.impute.quadratic()
,
mice.impute.rf()
,
mice.impute.ri()
Examples
# 1: Example with no auxiliary data: only pass unidentifiable model specification (ums)
# Specify argument to pass on to mnar imputation functions via "blots" argument
mnar.blot <- list(X = list(ums = "-4"), Y = list(ums = "2+1*ZCat1-3*ZCat2"))
# Run NARFCS by using mnar imputation methods and passing argument via blots
impNARFCS <- mice(mnar_demo_data,
method = c("mnar.logreg", "mnar.norm", ""),
blots = mnar.blot, seed = 234235, print = FALSE
)
# Obtain MI results: Note they coincide with those from old version at
# https://github.com/moreno-betancur/NARFCS
pool(with(impNARFCS, lm(Y ~ X + Z)))$pooled$estimate
#> [1] 19.368813 3.039045 -14.643202 -28.586061
# 2: Example passing also auxiliary data to MNAR procedure (umx)
# Assumptions:
# - Auxiliary data are complete, no missing values
# - Auxiliary data are a numeric matrix
# - Auxiliary data have same number of rows as x
# - Auxiliary data have no overlapping variable names with x
# Specify argument to pass on to mnar imputation functions via "blots" argument
aux <- matrix(0:1, nrow = nrow(mnar_demo_data))
dimnames(aux) <- list(NULL, "even")
mnar.blot <- list(
X = list(ums = "-4"),
Y = list(ums = "2+1*ZCat1-3*ZCat2+0.5*even", umx = aux)
)
# Run NARFCS by using mnar imputation methods and passing argument via blots
impNARFCS <- mice(mnar_demo_data,
method = c("mnar.logreg", "mnar.norm", ""),
blots = mnar.blot, seed = 234235, print = FALSE
)
# Obtain MI results: As expected they differ (slightly) from those
# from old version at https://github.com/moreno-betancur/NARFCS
pool(with(impNARFCS, lm(Y ~ X + Z)))$pooled$estimate
#> [1] 19.521134 2.952546 -14.729454 -28.699292