Using Random Perturbations to Mitigate Adversarial Attacks on Sentiment Analysis Models

02/11/2022
by   Abigail Swenor, et al.
0

Attacks on deep learning models are often difficult to identify and therefore are difficult to protect against. This problem is exacerbated by the use of public datasets that typically are not manually inspected before use. In this paper, we offer a solution to this vulnerability by using, during testing, random perturbations such as spelling correction if necessary, substitution by random synonym, or simply dropping the word. These perturbations are applied to random words in random sentences to defend NLP models against adversarial attacks. Our Random Perturbations Defense and Increased Randomness Defense methods are successful in returning attacked models to similar accuracy of models before attacks. The original accuracy of the model used in this work is 80 to accuracy between 0 accuracy of the model is returned to the original accuracy within statistical significance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset