Stratified Pilot Matching in R: The stratamatch Package
In a block-randomized controlled trial, individuals are subdivided by prognostically important baseline characteristics (e.g., age group, sex, or smoking status), prior to randomization. This step reduces the heterogeneity between the treatment groups with respect to the baseline factors most important to determining the outcome, thus enabling more precise estimation of treatment effect. The stratamatch package extends this approach to the observational setting by implementing functions to separate an observational data set into strata and interrogate the quality of different stratification schemes. Once an acceptable stratification is found, treated and control individuals can be matched by propensity score within strata, thereby recapitulating the block-randomized trial design for the observational study. The stratification scheme implemented by stratamatch applies a "pilot design" approach (Aikens, Greaves, and Baiocchi 2019) to estimate a quantity called the prognostic score (Hansen 2008), which is used to divide individuals into strata. The potential benefits of such an approach are twofold. First, stratifying the data enables more computationally efficient matching of large data sets. Second, methodological studies suggest that using a prognostic score to inform the matching process increases the precision of the effect estimate and reduces sensitivity to bias from unmeasured confounding factors (Aikens et al.2019; Leacy and Stuart 2014; Antonelli, Cefalu, Palmer, and Agniel 2018).A common mistake is to believe reserving more data for the analysis phase of a study is always better. Instead, the stratamatch approach suggests how clever use of data in the design phase of large studies can lead to major benefits in the robustness of the study conclusions.
READ FULL TEXT