Defending against Adversarial Audio via Diffusion Model
Deep learning models have been widely used in commercial acoustic systems in recent years. However, adversarial audio examples can cause abnormal behaviors for those acoustic systems, while being hard for humans to perceive. Various methods, such as transformation-based defenses and adversarial training, have been proposed to protect acoustic systems from adversarial attacks, but they are less effective against adaptive attacks. Furthermore, directly applying the methods from the image domain can lead to suboptimal results because of the unique properties of audio data. In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models. Taking advantage of the strong generation ability of diffusion models, AudioPure first adds a small amount of noise to the adversarial audio and then runs the reverse sampling step to purify the noisy audio and recover clean audio. AudioPure is a plug-and-play method that can be directly applied to any pretrained classifier without any fine-tuning or re-training. We conduct extensive experiments on speech command recognition task to evaluate the robustness of AudioPure. Our method is effective against diverse adversarial attacks (e.g. ℒ_2 or ℒ_∞-norm). It outperforms the existing methods under both strong adaptive white-box and black-box attacks bounded by ℒ_2 or ℒ_∞-norm (up to +20% in robust accuracy). Besides, we also evaluate the certified robustness for perturbations bounded by ℒ_2-norm via randomized smoothing. Our pipeline achieves a higher certified accuracy than baselines.
READ FULL TEXT