Treatment Effect Estimation with Efficient Data Aggregation
Data aggregation, also known as meta analysis, is widely used to synthesize knowledge on parameters shared in common (e.g., average treatment effect) between multiple studies. We introduce in this paper an attractive data aggregation protocol based on summary statistics from existing studies, to inform the design of a (new) validation study and estimate shared parameters by combining all the studies. In particular, we focus on the scenario where each existing study relies on an ℓ_1-regularized regression analysis to select a parsimonious model from a set of high dimensional covariates. We derive an estimator for the shared parameter by aggregating summary statistics from each study through a novel technique called data carving. Our estimator (a) aims to make the fullest possible use of data from the existing studies without the bias from data re-use in model selection and parameter estimation, and (b) provides the added benefit of individual data privacy, because full data from these studies need not be shared for efficient estimation.
READ FULL TEXT