Biased Bagging for Unsupervised Domain Adaptation

06/16/2017
by   Twan van Laarhoven, et al.
0

Unsupervised domain adaptation (DA) is an active research area whose development and applications have been boosted by the explosion of data without annotation. Manual labeling is tedious, error prone and expensive, therefore unsupervised DA is needed more and more to automatize this task: an unlabeled dataset (target) is annotated using a labeled dataset (source) from a related domain. The majority of DA methods try to directly match the distributions of the source and target data. Nevertheless, recent DA methods still suffer from issues such as the incapability to scale to high dimensional data or the sensitivity to the (hyper-)parameters of the adaptation procedure. We propose to overcome these issues by using bagging. The main idea is to directly embed a given source hypothesis inside a bagging procedure in order to generate a sequence of good target hypotheses related to the source, which are then used in a voting method for predicting labels of target data. Our method is extremely easy to implement and apply. Its effectiveness is demonstrated by a recent theoretical study on the generalization performance of voting methods in terms of large mean and low variance of the margin distribution. A qualitative analysis of our method shows that it tends to increase the mean and to reduce the variance of the target margin distribution, which is beneficial for generalization. We report state-of-the-art performance on benchmark datasets for adaptation tasks using text and image datasets with diverse characteristics, such as high number of features, large number of classes, and based on deep input features.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset