Learning from Small Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales
Motivated by the problem of learning when the number of training samples is small, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images. Kernels based on the minimum distance over a group of transformations, which corresponds to defining similarity as the best over the possible transformations, are not generally positive definite. Perhaps it is for this reason that they have neither previously been experimentally tested for their performance nor studied theoretically. Instead, previous attempts have employed kernels based on the average distance over a group of transformations, which are trivially positive definite, but which generally yield both poor margins as well as poor performance, as we show. We address this lacuna and show that positive definiteness indeed holds with high probability for kernels based on the minimum distance in the small training sample set regime of interest, and that they do yield the best results in that regime. Another important property of CNNs is their ability to incorporate local features at multiple spatial scales, e.g., through max pooling. A third important property is their ability to provide the benefits of composition through the architecture of multiple layers. We show how these additional properties can also be embedded into SVMs. We verify through experiments on widely available image sets that the resulting SVMs do provide superior accuracy in comparison to well-established deep neural network (DNN) benchmarks for small sample sizes.
READ FULL TEXT