On the Precise Error Analysis of Support Vector Machines
This paper investigates the asymptotic behavior of the soft-margin and hard-margin support vector machine (SVM) classifiers for simultaneously high-dimensional and numerous data (large n and large p with n/p→δ) drawn from a Gaussian mixture distribution. Sharp predictions of the classification error rate of the hard-margin and soft-margin SVM are provided, as well as asymptotic limits of as such important parameters as the margin and the bias. As a further outcome, the analysis allow for the identification of the maximum number of training samples that the hard-margin SVM is able to separate. The precise nature of our results allow for an accurate performance comparison of the hard-margin and soft-margin SVM as well as a better understanding of the involved parameters (such as the number of measurements and the margin parameter) on the classification performance. Our analysis, confirmed by a set of numerical experiments, builds upon the convex Gaussian min-max Theorem, and extends its scope to new problems never studied before by this framework.
READ FULL TEXT