Vehicle classification using ResNets, localisation and spatially-weighted pooling
We investigate whether ResNet architectures can outperform more traditional Convolutional Neural Networks on the task of fine-grained vehicle classification. We train and test ResNet-18, ResNet-34 and ResNet-50 on the Comprehensive Cars dataset without pre-training on other datasets. We then modify the networks to use Spatially Weighted Pooling. Finally, we add a localisation step before the classification process, using a network based on ResNet-50. We find that using Spatially Weighted Pooling and localisation both improve classification accuracy of ResNet50. Spatially Weighted Pooling increases accuracy by 1.5 percent points and localisation increases accuracy by 3.4 percent points. Using both increases accuracy by 3.7 percent points giving a top-1 accuracy of 96.351% on the Comprehensive Cars dataset. Our method achieves higher accuracy than a range of methods including those that use traditional CNNs. However, our method does not perform quite as well as pre-trained networks that use Spatially Weighted Pooling.
READ FULL TEXT