Combining multiple imputation with raking of weights in the setting of nearly-true models
Raking of weights is one approach to using data from the full cohort in a regression model where some variables of interest are measured only on a subsample. This approach relies on defining an auxiliary variable from the data observed on the whole cohort, which is then used to adjust the weights for the usual Horvitz-Thompson estimator. Computing the optimal raking estimator requires evaluating the expectation of the efficient score given the whole cohort data, which is generally infeasible. We demonstrate the use of multiple imputation as a practical method to compute a raking estimator that will be optimal when the imputation model is correctly specified. We compare this estimator to the common parametric and semi-parametric estimators, including standard multiple imputation. We show that while estimators, such as the semi-parametric maximum likelihood and multiple imputation estimator obtain optimal relative performance under the true model, the raking estimator maintains a better robustness-efficiency trade-off even under mild model misspecification. We demonstrate this property of the proposed raking estimator through several numerical examples and provide a theoretical discussion of conditions for the misspecification that leads to superior asymptotic relative efficiency.
READ FULL TEXT