MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks
As the development of neural networks, more and more deep neural networks are adopted in various tasks, such as image classification. However, as the huge computational overhead, these networks could not be applied on mobile devices or other low latency scenes. To address this dilemma, muti-exit convolutional network is proposed to allow faster inference via early exits with the corresponding classifiers. These networks utilize sophisticated designing to increase the early exit accuracy. However, naively training the multi-exit network could hurt the performance (accuracy) of deep neural networks as early-exit classifiers throughout interfere with the feature generation process. In this paper, we propose a general training framework named multi-self-distillation learning (MSD), which mining knowledge of different classifiers within the same network and boost every classifier accuracy. Our approach can be applied not only to multi-exit networks, but also modern CNNs (e.g., ResNet Series) augmented with additional side branch classifiers. We use sampling-based branch augmentation technique to transform a single-exit network into a multi-exit network. This reduces the gap of capacity between different classifiers, and improves the effectiveness of applying MSD. Our experiments show that MSD improves the accuracy of various networks: enhancing the accuracy of every classifier significantly for existing multi-exit network (MSDNet), improving vanilla single-exit networks with internal classifiers with high accuracy, while also improving the final accuracy.
READ FULL TEXT