Cross-Validated Decision Trees with Targeted Maximum Likelihood Estimation for Nonparametric Causal Mixtures Analysis
Exposure to mixtures of chemicals, such as drugs, pollutants, and nutrients, is common in real-world exposure or treatment scenarios. To understand the impact of these exposures on health outcomes, an interpretable and important approach is to estimate the causal effect of exposure regions that are most associated with a health outcome. This requires a statistical estimator that can identify these exposure regions and provide an unbiased estimate of a causal target parameter given the region. In this work, we present a methodology that uses decision trees to data-adaptively determine exposure regions and employs cross-validated targeted maximum likelihood estimation to unbiasedly estimate the average regional-exposure effect (ARE). This results in a plug-in estimator with an asymptotically normal distribution and minimum variance, from which confidence intervals can be derived. The methodology is implemented in the open-source software, CVtreeMLE, a package in R. Analysts put in a vector of exposures, covariates and an outcome and tables are given for regions in the exposures, such as lead > 2.1 arsenic > 1.4, with an associated ARE which represents the mean outcome difference if all individuals were exposed to this region compared to if none were exposed to this region. CVtreeMLE enables researchers to discover interpretable exposure regions in mixed exposure scenarios and provides robust statistical inference for the impact of these regions. The resulting quantities offer interpretable thresholds that can inform public health policies, such as pollutant regulations, or aid in medical decision-making, such as identifying the most effective drug combinations.
READ FULL TEXT