Test whether missingness is contingent upon the observed variables, according to the methodology developed by Jamshidian and Jalal (2010) (see Details).
Usage
mcar(
x,
imputed = mice(x, method = "norm"),
min_n = 6,
method = "auto",
replications = 10000,
use_chisq = 30,
alpha = 0.05
)
Arguments
- x
An object for which a method exists; usually a
data.frame
.- imputed
Either an object of class
mids
, as returned bymice()
, or a list ofdata.frame
s.- min_n
Atomic numeric, must be greater than 1. When there are missing data patterns with fewer than
min_n
cases, all cases with that pattern will be removed fromx
andimputed
.- method
Atomic character. If it is known (or assumed) that data are either multivariate normally distributed or not, then use either
method = "hawkins"
ormethod = "nonparametric"
, respectively. The default argumentmethod = "auto"
follows the procedure outlined in the Details section, and in Figure 7 of Jamshidian and Jalal (2010).- replications
Number of replications used to simulate the Neyman distribution when performing Hawkins' test. As this method is based on random sampling, use a high number of
replications
(and optionally,set.seed()
) to minimize Monte Carlo error and ensure reproducibility.- use_chisq
Atomic integer, indicating the minimum number of cases within a group k that triggers the use of asymptotic Chi-square distribution instead of the emprical distribution in the Neyman uniformity test, which is performed as part of Hawkins' test.
- alpha
Atomic numeric, indicating the significance level of tests.
Details
Three types of missingness have been distinguished in the literature (Rubin, 1976): Missing completely at random (MCAR), which means that missingness is random; missing at random (MAR), which means that missingness is contingent on the observed; and missing not at random (MNAR), which means that missingness is related to unobserved data.
Jamshidian and Jalal's non-parametric MCAR test assumes that the missing data are either MCAR or MAR, and tests whether the missingness is independent of the observed values. If so, the covariance matrices of the imputed data will be equal accross groups with different patterns of missingness. This test consists of the following procedure:
Data are imputed.
The imputed data are split into k groups according to the k missing data patterns in the original data (see
md.pattern()
).Perform Hawkins' test for equality of covariances across the k groups.
If the test is not significant, conclude that there is no evidence against multivariate normality of the data, nor against MCAR.
If the test is significant, and multivariate normality of the data can be assumed, then it can be concluded that missingness is MAR.
If multivariate normality cannot be assumed, then perform the Anderson-Darling non-parametric test for equality of covariances across the k groups.
If the Anderson-Darling test is not significant, this is evidence against multivariate normality - but no evidence against MCAR.
If the Anderson-Darling test is significant, this is evidence it can be concluded that missingness is MAR.
Note that, despite its name in common parlance, an MCAR test can only indicate whether missingness is MCAR or MAR. The procedure cannot distinguish MCAR from MNAR, so a non-significant result does not rule out MNAR.
This is a re-implementation of the function TestMCARNormality
, which was
originally published in the R-packgage MissMech
, which has been removed
from CRAN. This new implementation is faster, as its backend is written in
C++. It also enhances the functionality of the original:
Multiply imputed data can now be used; the median p-value and test statistic across replications is then reported, as suggested by Eekhout, Wiel, and Heymans (2017).
The printing method for an
mcar_object
gives a warning when at least one p-value of either test was significant. In this case, it is recommended to inspect the range of p-values, and consider potential violations of MCAR.A plotting method for an
mcar_object
is provided.A plotting method for the
$md.pattern
element of anmcar_object
is provided.
References
Rubin, D. B. (1976). Inference and Missing Data. Biometrika, Vol. 63, No. 3, pp. 581-592. doi:10.2307/2335739
Eekhout, I., M. A. Wiel, & M. W. Heymans (2017). Methods for Significance Testing of Categorical Covariates in Logistic Regression Models After Multiple Imputation: Power and Applicability Analysis. BMC Medical Research Methodology 17 (1): 129.
Jamshidian, M., & Jalal, S. (2010). Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika, 75(4), 649–674. doi:10.1007/s11336-010-9175-3
Examples
res <- mcar(nhanes)
#>
#> iter imp variable
#> 1 1 bmi hyp chl
#> 1 2 bmi hyp chl
#> 1 3 bmi hyp chl
#> 1 4 bmi hyp chl
#> 1 5 bmi hyp chl
#> 2 1 bmi hyp chl
#> 2 2 bmi hyp chl
#> 2 3 bmi hyp chl
#> 2 4 bmi hyp chl
#> 2 5 bmi hyp chl
#> 3 1 bmi hyp chl
#> 3 2 bmi hyp chl
#> 3 3 bmi hyp chl
#> 3 4 bmi hyp chl
#> 3 5 bmi hyp chl
#> 4 1 bmi hyp chl
#> 4 2 bmi hyp chl
#> 4 3 bmi hyp chl
#> 4 4 bmi hyp chl
#> 4 5 bmi hyp chl
#> 5 1 bmi hyp chl
#> 5 2 bmi hyp chl
#> 5 3 bmi hyp chl
#> 5 4 bmi hyp chl
#> 5 5 bmi hyp chl
# Examine test results
res
#>
#> Missing data patterns: 2 used, 3 removed.
#> Cases used: 20
#>
#> Hawkins' test: median chi^2 (4) = 2.041792, median p = 0.7280723
#>
#>
#> Interpretation of results:
#> Hawkins' test is not significant; there is no evidence to reject the assumptions of multivariate normality and MCAR.
# Plot p-values across imputed data sets
plot(res)
# Plot md patterns used for the test
plot(res, type = "md.pattern")
# Note difference with the raw md.patterns:
md.pattern(nhanes)
#> age hyp bmi chl
#> 13 1 1 1 1 0
#> 3 1 1 1 0 1
#> 1 1 1 0 1 1
#> 1 1 0 0 1 2
#> 7 1 0 0 0 3
#> 0 8 9 10 27