P-values for classification

01/18/2008
by   Lutz Duembgen, et al.
0

Let (X,Y) be a random variable consisting of an observed feature vector X∈X and an unobserved class label Y∈{1,2,...,L} with unknown joint distribution. In addition, let D be a training data set consisting of n completely observed independent copies of (X,Y). Usual classification procedures provide point predictors (classifiers) Y(X,D) of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each θ =1,2,...,L a p-value π_θ(X,D) for the null hypothesis that Y=θ, treating Y temporarily as a fixed parameter. In other words, the point predictor Y(X,D) is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset