Recalibration of Predictive Models as Approximate Probabilistic Updates
The output of predictive models is routinely recalibrated by reconciling low-level predictions with known derived quantities defined at higher levels of aggregation. For example, models predicting turnout probabilities at the individual level in U.S. elections can be adjusted so that their aggregation matches the observed vote totals in each state, thus producing better calibrated predictions. In this research note, we provide theoretical grounding for one of the most commonly used recalibration strategies, known colloquially as the "logit shift." Typically cast as a heuristic optimization problem (whereby an adjustment is found such that it minimizes the difference between aggregated predictions and the target totals), we show that the logit shift in fact offers a fast and accurate approximation to a principled, but often computationally impractical adjustment strategy: computing the posterior prediction probabilities, conditional on the target totals. After deriving analytical bounds on the quality of the approximation, we illustrate the accuracy of the approach using Monte Carlo simulations. The simulations also confirm analytical results regarding scenarios in which users of the simple logit shift can expect it to perform best – namely, when the aggregated targets are comprised of many individual predictions, and when the distribution of true probabilities is symmetric and tight around 0.5.
READ FULL TEXT