Leveraging Random Assignment in Multiple Imputation of Missing Covariates in Causal Studies
Baseline covariates in randomized experiments are often used in the estimation of treatment effects, for example, when estimating treatment effects within covariate-defined subgroups. In practice, however, covariate values may be missing for some data subjects. To handle missing values, analysts can use multiple imputation to create completed datasets, from which they can estimate the treatment effects. We investigate the performance of multiple imputation routines that utilize randomized treatment assignment, that is, make use of the fact that the true covariate distributions are the same across treatment arms. We do so for both ignorable and non-ignorable missing data, using simulation studies to compare the quality of inferences when we respect or disregard randomization. We consider this question for imputation routines estimated with covariates only, and imputation routines estimated conditional on the outcome variable. In either case, accounting for randomization does not noticeably improve inferences when sample sizes are in hundreds of units.
READ FULL TEXT