Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness
Ensemble approaches for uncertainty estimation have recently been applied to the tasks of misclassification detection, out-of-distribution input detection and adversarial attack detection. Prior Networks have been proposed as an approach to efficiently emulating an ensemble of models by parameterising a Dirichlet prior distribution over output distributions. These models have been shown to outperform ensemble approaches, such as Monte-Carlo Dropout, on the task of out-of-distribution input detection. However, scaling Prior Networks to complex datasets with many classes is difficult using the training criteria originally proposed. This paper makes two contributions. Firstly, we show that the appropriate training criterion for Prior Networks is the reverse KL-divergence between Dirichlet distributions. Using this loss we successfully train Prior Networks on image classification datasets with up to 200 classes and improve out-of-distribution detection performance. Secondly, taking advantage of the new training criterion, this paper investigates using Prior Networks to detect adversarial attacks. It is shown that the construction of successful adaptive whitebox attacks, which affect the prediction and evade detection, against Prior Networks trained on CIFAR-10 and CIFAR-100 takes a greater amount of computational effort than against standard neural networks, adversarially trained neural networks and dropout-defended networks.
READ FULL TEXT