Meta-Transfer Networks for Zero-Shot Learning
Zero-Shot Learning (ZSL) aims at recognizing unseen categories using some class semantics of the categories. The existing studies mostly leverage the seen categories to learn a visual-semantic interaction model to infer the unseen categories. However, the disjointness between the seen and unseen categories cannot ensure that the models trained on the seen categories generalize well to the unseen categories. In this work, we propose an episode-based approach to accumulate experiences on addressing disjointness issue by mimicking extensive classification scenarios where training classes and test classes are disjoint. In each episode, a visual-semantic interaction model is first trained on a subset of seen categories as a learner that provides an initial prediction for the rest disjoint seen categories and then a meta-learner fine-tunes the learner by minimizing the differences between the prediction and the ground-truth labels in a pre-defined space. By training extensive episodes on the seen categories, the model is trained to be an expert in predicting the mimetic unseen categories, which will generalize well to the real unseen categories. Extensive experiments on four datasets under both the traditional ZSL and generalized ZSL tasks show that our framework outperforms the state-of-the-art approaches by large margins.
READ FULL TEXT