Combining Observational and Experimental Data Using First-stage Covariates
Randomized controlled trials generate experimental variation that can credibly identify causal effects, but often suffer from limited scale, while observational datasets are large, but often violate desired identification assumptions. To improve estimation efficiency, I propose a method that combines experimental and observational datasets when 1) units from these two datasets are sampled from the same population and 2) some characteristics of these units are observed. I show that if these characteristics can partially explain treatment assignment in the observational data, they can be used to derive moment restrictions that, in combination with the experimental data, improve estimation efficiency. I outline three estimators (weighting, shrinkage, or GMM) for implementing this strategy, and show that my methods can reduce variance by up to 50 the experimental sample is required to attain the same statistical precision. If researchers are allowed to design experiments differently, I show that they can further improve the precision by directly leveraging this correlation between characteristics and assignment. I apply the method to a search listing dataset from Expedia that studies the causal effect of search rankings, and show that my method can substantially improve the precision.
READ FULL TEXT