Adversarial Attack on Skeleton-based Human Action Recognition
Deep learning models achieve impressive performance for skeleton-based human action recognition. However, the robustness of these models to adversarial attacks remains largely unexplored due to their complex spatio-temporal nature that must represent sparse and discrete skeleton joints. This work presents the first adversarial attack on skeleton-based action recognition with graph convolutional networks. The proposed targeted attack, termed Constrained Iterative Attack for Skeleton Actions (CIASA), perturbs joint locations in an action sequence such that the resulting adversarial sequence preserves the temporal coherence, spatial integrity, and the anthropomorphic plausibility of the skeletons. CIASA achieves this feat by satisfying multiple physical constraints, and employing spatial skeleton realignments for the perturbed skeletons along with regularization of the adversarial skeletons with Generative networks. We also explore the possibility of semantically imperceptible localized attacks with CIASA, and succeed in fooling the state-of-the-art skeleton action recognition models with high confidence. CIASA perturbations show high transferability for black-box attacks. We also show that the perturbed skeleton sequences are able to induce adversarial behavior in the RGB videos created with computer graphics. A comprehensive evaluation with NTU and Kinetics datasets ascertains the effectiveness of CIASA for graph-based skeleton action recognition and reveals the imminent threat to the spatio-temporal deep learning tasks in general.
READ FULL TEXT