Identifying optimally cost-effective dynamic treatment regimes with a Q-learning approach
Health policy decisions regarding patient treatment strategies require consideration of both treatment effectiveness and cost. Optimizing treatment rules with respect to effectiveness may result in prohibitively expensive strategies; on the other hand, optimizing with respect to costs may result in poor patient outcomes. We propose a two-step approach for identifying an optimally cost-effective and interpretable dynamic treatment regime. First, we develop a combined Q-learning and policy-search approach to estimate an optimal list-based regime under a constraint on expected treatment costs. Second, we propose an iterative procedure to select an optimally cost-effective regime from a set of candidate regimes corresponding to different cost constraints. Our approach can estimate optimal regimes in the presence of commonly encountered challenges including time-varying confounding and correlated outcomes. Through simulation studies, we illustrate the validity of estimated optimal treatment regimes and examine operating characteristics under flexible modeling approaches.