Cross Validation for Rare Events

02/01/2022
by   Anass Aghbalou, et al.
0

We derive sanity-check bounds for the cross-validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme or rare events. We consider classification on extreme regions of the covariate space, a problem analyzed in Jalalzai et al. 2018. The risk is then a probability of error conditional to the norm of the covariate vector exceeding a high quantile. Establishing sanity-check bounds consist in recovering bounds regarding the CV estimate that are of the same nature as the ones regarding the empirical risk. We achieve this goal both for K-fold CV with an exponential bound and for leave-p-out CV with a polynomial bound, thus extending the state-of-the-art results to the modified version of the risk which is adapted to extreme value analysis.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset