Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests
In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data is thus endogenously missing for units who do not engage, in which case random or conditionally random treatment assignment prior to such choices is insufficient to point identify treatment effects. Non-parametric partial identification bounds are a way to address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are very wide and therefore minimally informative. We present methods for narrowing non-parametric bounds on treatment effects by adjusting for potentially large numbers of covariates, working with generalized random forests. Our approach allows for agnosticism about the data-generating process and honest inference. We use a simulation study and two replication exercises to demonstrate the benefits of our approach.
READ FULL TEXT