This is the fourth vignette in the Winnipeg series.

In this vignette we will walk you through the more advanced features of mice, such as post-processing of imputations and passive imputation.


1. Open R and load the packages mice and lattice.

require(mice)
require(lattice)
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document.


Post-proccessing of the imputations

Remember that we imputed the boys data in the previous tutorial with pmm and with norm. One of the problems with the imputed values of tv with norm is that there are negative values among the imputations. Somehow we should be able to lay a constraint on the imputed values of tv.

The mice() function has an argument called post that takes a vector of strings of R commands. These commands are parsed and evaluated after the univariate imputation function returns, and thus provides a way of post-processing the imputed values while using the processed version in the imputation algorithm. In other words; the post-processing allows us to manipulate the imputations for a particular variable that are generated within each iteration. Such manipulations directly affect the imputed values of that variable and the imputations for other variables. Naturally, such a procedure should be handled with care.


2. Post-process the values to constrain them between 1 and 25, use norm as the imputation method for tv.

ini <- mice(boys, maxit = 0)
meth <- ini$meth
meth["tv"] <- "norm"
post <- ini$post
post["tv"] <- "imp[[j]][, i] <- squeeze(imp[[j]][, i], c(1, 25))"
imp <- mice(boys, meth=meth, post=post, print=FALSE)

In this way the imputed values of tv are constrained (squeezed by function squeeze()) between 1 and 25.


3. Compare the imputed values and histograms of tv for the solution obtained by pmm to the constrained solution (created with norm, constrained between 1 and 25).

First, we recreate the default pmm solution

imp.pmm <- mice(boys, print=FALSE)

and we inspect the imputed values for the norm solution

table(complete(imp)$tv)
## 
##                1 1.14379178305323  1.1684155847437 1.17316076012133 
##              267                1                1                1 
## 1.17378509226147 1.24511360128507  1.2920947139319 1.39904960692056 
##                1                1                1                1 
## 1.43213168865065 1.45648027895582 1.51637584261004 1.52923556376078 
##                1                1                1                1 
## 1.60889679189235 1.62061152738467 1.69650498206727 1.74504463528787 
##                1                1                1                1 
## 1.74548045984735 1.77599158404481 1.85359421510797 1.88542493095854 
##                1                1                1                1 
##  1.9534452696723 1.96089101777596 1.96788784168658                2 
##                1                1                1               26 
##   2.052222203574 2.11766897506544 2.21106400975555 2.25217339254909 
##                1                1                1                1 
## 2.35733332474577 2.37716280918527 2.43162462020511 2.52099282518935 
##                1                1                1                1 
## 2.52872609822489 2.55949845508687 2.59101507541211 2.64929451929116 
##                1                1                1                1 
## 2.71437454039584 2.83557105064129 2.85952101388024 2.89700593915898 
##                1                1                1                1 
##   2.937318679713 2.93995117148368 2.96880128560378 2.97338143891255 
##                1                1                1                1 
## 2.99124265295524                3 3.15697391607161 3.22986613402941 
##                1               19                1                1 
## 3.26174697827408 3.33230965418743 3.38333930579673 3.43081708966723 
##                1                1                1                1 
## 3.48954179311803 3.49922162069612 3.51147019038741 3.51642250900974 
##                1                1                1                1 
## 3.66895668990112  3.7219120923471 3.72437204965384 3.74305509956764 
##                1                1                1                1 
## 3.76542820488391 3.81427377989944 3.85282982462055 3.99877653657402 
##                1                1                1                1 
##                4  4.0930290205318  4.1541989793975 4.37270950326876 
##               17                1                1                1 
## 4.51401889447084 4.63185224399455 4.66716979690801  4.7533368574383 
##                1                1                1                1 
## 4.81533239155909 4.89222156167534                5 5.07067713888799 
##                1                1                5                1 
## 5.08624557783435 5.36826293912086 5.37695867707583 5.41889166280967 
##                1                1                1                1 
## 5.47746942557712 5.50849291928038 5.59722237373242 5.92545621571141 
##                1                1                1                1 
##                6 6.20861172409472 6.20962017966782 6.22455693285517 
##               10                1                1                1 
## 6.23455712588063 6.34783540670059 6.36673814343557  6.4967736777462 
##                1                1                1                1 
## 6.57557231046333 6.68948381261379 6.74099100304983 6.79390292815759 
##                1                1                1                1 
## 6.79978232692585 6.86795570860815 6.92400822532086 6.99214173203389 
##                1                1                1                1 
## 7.04943370159885 7.19515565386227 7.33009067162911 7.38380746572593 
##                1                1                1                1 
## 7.85329526574817 7.99755289184496                8 8.16331964645683 
##                1                1               13                1 
## 8.18835906214847 8.20885340892607 8.44773070204883 8.90858501269007 
##                1                1                1                1 
##                9 9.00599490255543 9.21496349145335 9.48567729972039 
##                1                1                1                1 
## 9.63146502494763 9.70162750593184 9.79938857719224 9.82273979527481 
##                1                1                1                1 
## 9.86443060506073               10 10.1408769040131 10.4737102074094 
##                1               16                1                1 
## 10.8022406655138 11.1969155810269 11.6610314489435 11.9292355645848 
##                1                1                1                1 
##               12 12.0135149243355 12.0551708025149 12.0994720138005 
##               15                1                1                1 
##  12.396741486792 12.4854081835894 12.5235846750363  12.544645441477 
##                1                1                1                1 
## 12.5492705970762 12.6061805985707 12.7055173308725  12.710611942432 
##                1                1                1                1 
##               13 13.0337356336826 13.0652634243384   13.11518801144 
##                1                1                1                1 
## 13.2422602311256 13.6203049691005 13.8371129236818 13.8905632007034 
##                1                1                1                1 
## 13.9158421168451               14  14.008567604744 14.1405360672459 
##                1                1                1                1 
## 14.2539925805195 14.2898901021194 14.4128573988382 14.4544759797076 
##                1                1                1                1 
## 14.4669850215578 14.6690666881828 14.7079774429461 14.7347202119273 
##                1                1                1                1 
## 14.7965771280285 14.9124443558062               15 15.0993552289367 
##                1                1               27                1 
## 15.3822308886237 15.3873044893946 15.4419317037939 15.4490002908289 
##                1                1                1                1 
##  15.532505860574 15.6448382532494  15.654807191397 15.7373867428617 
##                1                1                1                1 
## 15.8217713725773 15.9357119733232 15.9969396473207               16 
##                1                1                1                1 
## 16.0003267872939 16.2392846673318 16.4782462552656 16.5573428922014 
##                1                1                1                1 
## 16.6332653480647 16.6539462557867 16.7648843282994 16.8224691363798 
##                1                1                1                1 
## 16.9063539890898 16.9090292557868               17 17.1110341129431 
##                1                1                1                1 
## 17.1341318570465 17.1704453324795 17.2596385176739 17.2620972362856 
##                1                1                1                1 
## 17.3110053859399 17.3845392853176 17.4709872745994 17.6282073124395 
##                1                1                1                1 
## 17.7703192589529 17.8424758254071               18 18.0447548449424 
##                1                1                1                1 
## 18.3114324839028 18.3219223447227 18.3687553531016 18.5200301737492 
##                1                1                1                1 
## 18.6906586361342  18.824369372339  18.888989569465 18.9454037558594 
##                1                1                1                1 
## 19.0224097672279 19.0268759433165 19.1411662417712 19.1917437875376 
##                1                1                1                1 
## 19.2895438925769  19.507552006093 19.5489186828569 19.5567082258108 
##                1                1                1                1 
## 19.5588587421077 19.5642002724842 19.5947213873423 19.6201088721128 
##                1                1                1                1 
## 19.6406895630419 19.6449273560213 19.6658773011161  19.712372217892 
##                1                1                1                1 
## 19.7333992526021  19.962218546138               20  20.009667645789 
##                1                1               38                1 
## 20.0681010346108 20.0822920415997 20.0931084341637   20.10576021965 
##                1                1                1                1 
## 20.1376167022507 20.1957135439128 20.2302792886254 20.2814810247995 
##                1                1                1                1 
## 20.4225773446851  20.605253564141 20.6739157261392 20.7017785583166 
##                1                1                1                1 
## 20.7101647730519 20.8747174412382 20.8836486807217 21.0350562495055 
##                1                1                1                1 
## 21.0858274842493 21.1079954405288 21.2101142307418 21.2278822056401 
##                1                1                1                1 
## 21.3038214119349  21.384464840931 21.4742406211183 21.8106141247688 
##                1                1                1                1 
## 21.8645302609172 21.9790488979912 22.0613546703795 22.1678861288083 
##                1                1                1                1 
## 22.3176312492075 22.5185746803201 22.6620881442435 22.7453872708055 
##                1                1                1                1 
## 23.0061780005677 23.3715498120438  23.463704351884 23.5253102746284 
##                1                1                1                1 
## 23.8199621820551 24.0451001933185 24.3998310806719 24.4780014016906 
##                1                1                1                1 
##               25 
##               38

and for the pmm solution

table(complete(imp.pmm)$tv)
## 
##   1   2   3   4   5   6   8   9  10  12  13  14  15  16  17  18  20  25 
##  68 216  89  33  10  25  22   2  28  32   1   5  58   3   2   2  89  63

It is clear that the norm solution does not give us integer data as imputations. Next, we inspect and compare the density of the incomplete and imputed data for the constrained solution.

densityplot(imp, ~tv)


A nice way of plotting the histograms of both datasets simultaneously is by creating first the dataframe (here we named it tvm) that contains the data in one column and the imputation method in another column.

tv <- c(complete(imp.pmm)$tv, complete(imp)$tv)
method <- rep(c("pmm", "norm"), each = nrow(boys))
tvm <- data.frame(tv = tv, method = method)

and then plotting a histogram of tv conditional on method.

histogram( ~tv | method, data = tvm, nint = 25)

Is there still a difference in distribution between the two different imputation methods? Which imputations are more plausible do you think?


4. Make a missing data indicator (name it miss) for bmi and check the relation of bmi, wgt and hgt for the boys in the imputed data. To do so, plot the imputed values against their respective calculated values.

miss <- is.na(imp$data$bmi)
xyplot(imp, bmi ~ I (wgt / (hgt / 100)^2),
       na.groups = miss, cex = c(0.8, 1.2), pch = c(1, 20),
       ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")

With this plot we show that the relation between hgt, wgt and bmi is not preserved in the imputed values. In order to preserve this relation, we should use passive imputation.


Passive Imputation

There is often a need for transformed, combined or recoded versions of the data. In the case of incomplete data, one could impute the original, and transform the completed original afterwards, or transform the incomplete original and impute the transformed version. If, however, both the original and the transformed version are needed within the imputation algorithm, neither of these approaches work: One cannot be sure that the transformation holds between the imputed values of the original and transformed versions. mice has a built-in approach, called passive imputation, to deal with situations as described above.

The goal of passive imputation is to maintain the consistency among different transformations of the same data. As an example, consider the following deterministic function in the boys data \[\text{BMI} = \frac{\text{Weight (kg)}}{\text{Height}^2 \text{(m)}}\].

5. Use passive imputation to conserve the relation between imputed bmi, wgt and hgt by setting the imputation method for bmi to meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)".

meth<- ini$meth
meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)"
imp <- mice(boys, meth=meth, print=FALSE)

6. Again, plot the imputed values of bmi versus the calculated values and check whether there is convergence for bmi.

To inspect the relation:

xyplot(imp, bmi ~ I(wgt / (hgt / 100)^2), na.groups = miss,
       cex = c(1, 1), pch = c(1, 20),
       ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")

To study convergence for bmi alone:

plot(imp, c("bmi"))

Although the relation of bmi is preserved now in the imputations we get absurd imputations and on top of that we clearly see there are some problems with the convergence of bmi. The problem is that we have circularity in the imputations. We used passive imputation for bmi but bmi is also automatically used as predictor for wgt and hgt. This can be solved by adjusting the predictor matrix.


7. Solve the problem of circularity (if you did not already do so) and plot once again the imputed values of bmi versus the calculated values.

First, we remove bmi as a predictor for hgt and wgt to remove circularity.

pred<-ini$pred
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   0   0   0  0   0   0  0   0
## hgt   1   0   1   1  1   1   1  1   1
## wgt   1   1   0   1  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0
pred[c("hgt", "wgt"), "bmi"] <- 0
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   0   0   0  0   0   0  0   0
## hgt   1   0   1   0  1   1   1  1   1
## wgt   1   1   0   0  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

and we run the mice algorithm again with the new predictor matrix (we still ‘borrow’ the imputation methods object meth from before)

imp <-mice(boys, meth=meth, pred=pred, print=FALSE)

Second, we recreate the plots from Assignment 6. We start with the plot to inspect the relations in the observed and imputed data

xyplot(imp, bmi ~ I(wgt / (hgt / 100)^2), na.groups = miss,
       cex=c(1, 1), pch=c(1, 20),
       ylab="BMI (kg/m2) Imputed", xlab="BMI (kg/m2) Calculated")

and continue with the trace plot to study convergence

plot(imp, c("bmi"))

All is well now!


Conclusion

We have seen that post-processing of imputations and passive imputation can be used to preserve typical features of the data while maintaining the relations between the variables in both the observed and imputed data.

It is important to ‘gain’ this control as a user. After all, we are imputing values and taking their uncertainty properly into account.


For fun: what you shouldn’t do with passive imputation

Never set all relations fixed. You will remain with the starting values and waste your computer’s energy (and your own).

ini <- mice(boys, maxit = 0)
meth<- ini$meth
pred <- ini$pred
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   0   0   0  0   0   0  0   0
## hgt   1   0   1   1  1   1   1  1   1
## wgt   1   1   0   1  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0
meth["bmi"]<- "~ I(wgt/(hgt/100)^2)"
meth["wgt"]<- "~ I(bmi*(hgt/100)^2)"
meth["hgt"]<- "~ I(sqrt(wgt/bmi)*100)"
imp.path <- mice(boys, meth = meth, pred = pred, seed = 123)
## 
##  iter imp variable
##   1   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
plot(imp.path, c("hgt", "wgt", "bmi"))

We named the mids- object imp.path, because the nonconvergence is pathological in this example!


- End of Vignette