Classification from Pairwise Similarity and Unlabeled Data
One of the biggest bottlenecks in supervised learning is its high labeling cost. To overcome this problem, we propose a new weakly-supervised learning setting called SU classification, where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data are needed, instead of fully-supervised data. We show that an unbiased estimator of the classification risk can be obtained only from SU data, and its empirical risk minimizer achieves the optimal parametric convergence rate. Finally, we demonstrate the effectiveness of the proposed method through experiments.
READ FULL TEXT