Extrapolating from a Single Image to a Thousand Classes using Distillation
What can neural networks learn about the visual world from a single image? While it obviously cannot contain the multitudes of possible objects, scenes and lighting conditions that exist - within the space of all possible 256^(3x224x224) 224-sized square images, it might still provide a strong prior for natural images. To analyze this hypothesis, we develop a framework for training neural networks from scratch using a single image by means of knowledge distillation from a supervisedly pretrained teacher. With this, we find that the answer to the above question is: 'surprisingly, a lot'. In quantitative terms, we find top-1 accuracies of 94 ImageNet and, by extending this method to audio, 84 extensive analyses we disentangle the effect of augmentations, choice of source image and network architectures and also discover "panda neurons" in networks that have never seen a panda. This work shows that one image can be used to extrapolate to thousands of object classes and motivates a renewed research agenda on the fundamental interplay of augmentations and image.
READ FULL TEXT