Skip to contents

This function converts imputed data stored in long format into an object of class mids. The original incomplete dataset needs to be available so that we know where the missing data are. The function is useful to convert back operations applied to the imputed data back in a mids object. It may also be used to store multiply imputed data sets from other software into the format used by mice.

Usage

as.mids(long, where = NULL, .imp = ".imp", .id = ".id")

Arguments

long

A multiply imputed data set in long format, for example produced by a call to complete(..., action = 'long', include = TRUE), or by other software.

where

A data frame or matrix with logicals of the same dimensions as data indicating where in the data the imputations should be created. The default, where = is.na(data), specifies that the missing data should be imputed. The where argument may be used to overimpute observed data, or to skip imputations for selected missing values. Note: Imputation methods that generate imptutations outside of mice, like mice.impute.panImpute() may depend on a complete predictor space. In that case, a custom where matrix can not be specified.

.imp

An optional column number or column name in long, indicating the imputation index. The values are assumed to be consecutive integers between 0 and m. Values 1 through m correspond to the imputation index, value 0 indicates the original data (with missings). By default, the procedure will search for a variable named ".imp".

.id

An optional column number or column name in long, indicating the subject identification. If not specified, then the function searches for a variable named ".id". If this variable is found, the values in the column will define the row names in the data element of the resulting mids object.

Value

An object of class mids

Note

The function expects the input data long to be sorted by imputation number (variable ".imp" by default), and in the same sequence within each imputation block.

Author

Gerko Vink

Examples

# impute the nhanes dataset
imp <- mice(nhanes, print = FALSE)
# extract the data in long format
X <- complete(imp, action = "long", include = TRUE)
# create dataset with .imp variable as numeric
X2 <- X

# nhanes example without .id
test1 <- as.mids(X)
is.mids(test1)
#> [1] TRUE
identical(complete(test1, action = "long", include = TRUE), X)
#> [1] TRUE

# nhanes example without .id where .imp is numeric
test2 <- as.mids(X2)
is.mids(test2)
#> [1] TRUE
identical(complete(test2, action = "long", include = TRUE), X)
#> [1] TRUE

# nhanes example, where we explicitly specify .id as column 2
test3 <- as.mids(X, .id = ".id")
is.mids(test3)
#> [1] TRUE
identical(complete(test3, action = "long", include = TRUE), X)
#> [1] TRUE

# nhanes example with .id where .imp is numeric
test4 <- as.mids(X2, .id = 6)
is.mids(test4)
#> [1] TRUE
identical(complete(test4, action = "long", include = TRUE), X)
#> [1] TRUE

# example without an .id variable
# variable .id not preserved
X3 <- X[, -6]
test5 <- as.mids(X3)
is.mids(test5)
#> [1] TRUE
identical(complete(test5, action = "long", include = TRUE)[, -6], X[, -6])
#> [1] TRUE

# as() syntax has fewer options
test7 <- as(X, "mids")
test8 <- as(X2, "mids")
test9 <- as(X2[, -6], "mids")
rev <- ncol(X):1
test10 <- as(X[, rev], "mids")

# where argument copies also observed data into $imp element
where <- matrix(TRUE, nrow = nrow(nhanes), ncol = ncol(nhanes))
colnames(where) <- colnames(nhanes)
test11 <- as.mids(X, where = where)
identical(complete(test11, action = "long", include = TRUE), X)
#> [1] TRUE