This is the fourth vignette in the Winnipeg series.
In this vignette we will walk you through the more advanced features of mice
, such as post-processing of imputations and passive imputation.
1. Open R
and load the packages mice
and lattice
.
require(mice)
require(lattice)
set.seed(123)
We choose seed value 123
. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R
instance to 123
, you will get the exact same results and plots as we present in this document.
Post-proccessing of the imputations
Remember that we imputed the boys
data in the previous tutorial with pmm
and with norm
. One of the problems with the imputed values of tv
with norm
is that there are negative values among the imputations. Somehow we should be able to lay a constraint on the imputed values of tv
.
The mice()
function has an argument called post
that takes a vector of strings of R
commands. These commands are parsed and evaluated after the univariate imputation function returns, and thus provides a way of post-processing the imputed values while using the processed version in the imputation algorithm. In other words; the post-processing allows us to manipulate the imputations for a particular variable that are generated within each iteration. Such manipulations directly affect the imputed values of that variable and the imputations for other variables. Naturally, such a procedure should be handled with care.
2. Post-process the values to constrain them between 1 and 25, use norm
as the imputation method for tv
.
ini <- mice(boys, maxit = 0)
meth <- ini$meth
meth["tv"] <- "norm"
post <- ini$post
post["tv"] <- "imp[[j]][, i] <- squeeze(imp[[j]][, i], c(1, 25))"
imp <- mice(boys, meth=meth, post=post, print=FALSE)
In this way the imputed values of tv
are constrained (squeezed by function squeeze()
) between 1 and 25.
3. Compare the imputed values and histograms of tv
for the solution obtained by pmm
to the constrained solution (created with norm
, constrained between 1 and 25).
First, we recreate the default pmm
solution
imp.pmm <- mice(boys, print=FALSE)
and we inspect the imputed values for the norm
solution
table(complete(imp)$tv)
##
## 1 1.14379178305323 1.1684155847437 1.17316076012133
## 267 1 1 1
## 1.17378509226147 1.24511360128507 1.2920947139319 1.39904960692056
## 1 1 1 1
## 1.43213168865065 1.45648027895582 1.51637584261004 1.52923556376078
## 1 1 1 1
## 1.60889679189235 1.62061152738467 1.69650498206727 1.74504463528787
## 1 1 1 1
## 1.74548045984735 1.77599158404481 1.85359421510797 1.88542493095854
## 1 1 1 1
## 1.9534452696723 1.96089101777596 1.96788784168658 2
## 1 1 1 26
## 2.052222203574 2.11766897506544 2.21106400975555 2.25217339254909
## 1 1 1 1
## 2.35733332474577 2.37716280918527 2.43162462020511 2.52099282518935
## 1 1 1 1
## 2.52872609822489 2.55949845508687 2.59101507541211 2.64929451929116
## 1 1 1 1
## 2.71437454039584 2.83557105064129 2.85952101388024 2.89700593915898
## 1 1 1 1
## 2.937318679713 2.93995117148368 2.96880128560378 2.97338143891255
## 1 1 1 1
## 2.99124265295524 3 3.15697391607161 3.22986613402941
## 1 19 1 1
## 3.26174697827408 3.33230965418743 3.38333930579673 3.43081708966723
## 1 1 1 1
## 3.48954179311803 3.49922162069612 3.51147019038741 3.51642250900974
## 1 1 1 1
## 3.66895668990112 3.7219120923471 3.72437204965384 3.74305509956764
## 1 1 1 1
## 3.76542820488391 3.81427377989944 3.85282982462055 3.99877653657402
## 1 1 1 1
## 4 4.0930290205318 4.1541989793975 4.37270950326876
## 17 1 1 1
## 4.51401889447084 4.63185224399455 4.66716979690801 4.7533368574383
## 1 1 1 1
## 4.81533239155909 4.89222156167534 5 5.07067713888799
## 1 1 5 1
## 5.08624557783435 5.36826293912086 5.37695867707583 5.41889166280967
## 1 1 1 1
## 5.47746942557712 5.50849291928038 5.59722237373242 5.92545621571141
## 1 1 1 1
## 6 6.20861172409472 6.20962017966782 6.22455693285517
## 10 1 1 1
## 6.23455712588063 6.34783540670059 6.36673814343557 6.4967736777462
## 1 1 1 1
## 6.57557231046333 6.68948381261379 6.74099100304983 6.79390292815759
## 1 1 1 1
## 6.79978232692585 6.86795570860815 6.92400822532086 6.99214173203389
## 1 1 1 1
## 7.04943370159885 7.19515565386227 7.33009067162911 7.38380746572593
## 1 1 1 1
## 7.85329526574817 7.99755289184496 8 8.16331964645683
## 1 1 13 1
## 8.18835906214847 8.20885340892607 8.44773070204883 8.90858501269007
## 1 1 1 1
## 9 9.00599490255543 9.21496349145335 9.48567729972039
## 1 1 1 1
## 9.63146502494763 9.70162750593184 9.79938857719224 9.82273979527481
## 1 1 1 1
## 9.86443060506073 10 10.1408769040131 10.4737102074094
## 1 16 1 1
## 10.8022406655138 11.1969155810269 11.6610314489435 11.9292355645848
## 1 1 1 1
## 12 12.0135149243355 12.0551708025149 12.0994720138005
## 15 1 1 1
## 12.396741486792 12.4854081835894 12.5235846750363 12.544645441477
## 1 1 1 1
## 12.5492705970762 12.6061805985707 12.7055173308725 12.710611942432
## 1 1 1 1
## 13 13.0337356336826 13.0652634243384 13.11518801144
## 1 1 1 1
## 13.2422602311256 13.6203049691005 13.8371129236818 13.8905632007034
## 1 1 1 1
## 13.9158421168451 14 14.008567604744 14.1405360672459
## 1 1 1 1
## 14.2539925805195 14.2898901021194 14.4128573988382 14.4544759797076
## 1 1 1 1
## 14.4669850215578 14.6690666881828 14.7079774429461 14.7347202119273
## 1 1 1 1
## 14.7965771280285 14.9124443558062 15 15.0993552289367
## 1 1 27 1
## 15.3822308886237 15.3873044893946 15.4419317037939 15.4490002908289
## 1 1 1 1
## 15.532505860574 15.6448382532494 15.654807191397 15.7373867428617
## 1 1 1 1
## 15.8217713725773 15.9357119733232 15.9969396473207 16
## 1 1 1 1
## 16.0003267872939 16.2392846673318 16.4782462552656 16.5573428922014
## 1 1 1 1
## 16.6332653480647 16.6539462557867 16.7648843282994 16.8224691363798
## 1 1 1 1
## 16.9063539890898 16.9090292557868 17 17.1110341129431
## 1 1 1 1
## 17.1341318570465 17.1704453324795 17.2596385176739 17.2620972362856
## 1 1 1 1
## 17.3110053859399 17.3845392853176 17.4709872745994 17.6282073124395
## 1 1 1 1
## 17.7703192589529 17.8424758254071 18 18.0447548449424
## 1 1 1 1
## 18.3114324839028 18.3219223447227 18.3687553531016 18.5200301737492
## 1 1 1 1
## 18.6906586361342 18.824369372339 18.888989569465 18.9454037558594
## 1 1 1 1
## 19.0224097672279 19.0268759433165 19.1411662417712 19.1917437875376
## 1 1 1 1
## 19.2895438925769 19.507552006093 19.5489186828569 19.5567082258108
## 1 1 1 1
## 19.5588587421077 19.5642002724842 19.5947213873423 19.6201088721128
## 1 1 1 1
## 19.6406895630419 19.6449273560213 19.6658773011161 19.712372217892
## 1 1 1 1
## 19.7333992526021 19.962218546138 20 20.009667645789
## 1 1 38 1
## 20.0681010346108 20.0822920415997 20.0931084341637 20.10576021965
## 1 1 1 1
## 20.1376167022507 20.1957135439128 20.2302792886254 20.2814810247995
## 1 1 1 1
## 20.4225773446851 20.605253564141 20.6739157261392 20.7017785583166
## 1 1 1 1
## 20.7101647730519 20.8747174412382 20.8836486807217 21.0350562495055
## 1 1 1 1
## 21.0858274842493 21.1079954405288 21.2101142307418 21.2278822056401
## 1 1 1 1
## 21.3038214119349 21.384464840931 21.4742406211183 21.8106141247688
## 1 1 1 1
## 21.8645302609172 21.9790488979912 22.0613546703795 22.1678861288083
## 1 1 1 1
## 22.3176312492075 22.5185746803201 22.6620881442435 22.7453872708055
## 1 1 1 1
## 23.0061780005677 23.3715498120438 23.463704351884 23.5253102746284
## 1 1 1 1
## 23.8199621820551 24.0451001933185 24.3998310806719 24.4780014016906
## 1 1 1 1
## 25
## 38
and for the pmm
solution
table(complete(imp.pmm)$tv)
##
## 1 2 3 4 5 6 8 9 10 12 13 14 15 16 17 18 20 25
## 68 216 89 33 10 25 22 2 28 32 1 5 58 3 2 2 89 63
It is clear that the norm solution does not give us integer data as imputations. Next, we inspect and compare the density of the incomplete and imputed data for the constrained solution.
densityplot(imp, ~tv)
A nice way of plotting the histograms of both datasets simultaneously is by creating first the dataframe (here we named it tvm
) that contains the data in one column and the imputation method in another column.
tv <- c(complete(imp.pmm)$tv, complete(imp)$tv)
method <- rep(c("pmm", "norm"), each = nrow(boys))
tvm <- data.frame(tv = tv, method = method)
and then plotting a histogram of tv
conditional on method.
histogram( ~tv | method, data = tvm, nint = 25)
Is there still a difference in distribution between the two different imputation methods? Which imputations are more plausible do you think?
4. Make a missing data indicator (name it miss
) for bmi
and check the relation of bmi
, wgt
and hgt
for the boys in the imputed data. To do so, plot the imputed values against their respective calculated values.
miss <- is.na(imp$data$bmi)
xyplot(imp, bmi ~ I (wgt / (hgt / 100)^2),
na.groups = miss, cex = c(0.8, 1.2), pch = c(1, 20),
ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")