Cross Validation for Rare Events
We derive sanity-check bounds for the cross-validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme or rare events. We consider classification on extreme regions of the covariate space, a problem analyzed in Jalalzai et al. 2018. The risk is then a probability of error conditional to the norm of the covariate vector exceeding a high quantile. Establishing sanity-check bounds consist in recovering bounds regarding the CV estimate that are of the same nature as the ones regarding the empirical risk. We achieve this goal both for K-fold CV with an exponential bound and for leave-p-out CV with a polynomial bound, thus extending the state-of-the-art results to the modified version of the risk which is adapted to extreme value analysis.
READ FULL TEXT