Fast classification rates without standard margin assumptions
We consider the classical problem of learning rates for classes with finite VC dimension. It is well known that fast learning rates are achievable by the empirical risk minimization algorithm (ERM) if one of the low noise/margin assumptions such as Tsybakov's and Massart's condition is satisfied. In this paper, we consider an alternative way of obtaining fast learning rates in classification if none of these conditions are met. We first consider Chow's reject option model and show that by lowering the impact of a small fraction of hard instances, fast learning rate is achievable in an agnostic model by a specific learning algorithm. Similar results were only known under special versions of margin assumptions. We also show that the learning algorithm achieving these rates is adaptive to standard margin assumptions and always satisfies the risk bounds achieved by ERM. Based on our results on Chow's model, we then analyze a particular family of VC classes, namely classes with finite combinatorial diameter. Using their special structure, we show that there is an improper learning algorithm that provides fast rates of convergence even in the (poorly understood) situations where ERM is suboptimal. This provides the first setup in which an improper learning algorithm may significantly improve the learning rates for non-convex losses. Finally, we discuss some implications of our techniques to the analysis of ERM.
READ FULL TEXT