BlurNet: Defense by Filtering the Feature Maps

08/06/2019
by   Ravi Raju, et al.
7

Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adverserial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations (RP_2), generates adverserial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the RP_2 attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset by demonstrating high frequency noise is introduced into the input image by the RP_2 algorithm. To alleviate the high frequency, we introduce a depthwise convolution layer of standard blur kernels after the first layer. Finally, we present a regularization scheme to incorporate this low-pass filtering behavior into the training regime of the network.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset