Enhancing ML Robustness Using Physical-World Constraints
Recent advances in Machine Learning (ML) have demonstrated that neural networks can exceed human performance in many tasks. While generalizing well over natural inputs, neural networks are vulnerable to adversarial inputs -an input that is "similar" to the original input, but misclassified by the model. Existing defenses focus on Lp-norm bounded adversaries that perturb ML inputs in the digital space. In the real world, however, attackers can generate adversarial perturbations that have a large Lp-norm in the digital space. Additionally, these defenses also come at a cost to accuracy, making their applicability questionable in the real world. To defend models against such a powerful adversary, we leverage one constraint on its power: the perturbation should not change the human's perception of the physical information; the physical world places some constraints on the space of possible attacks. Two questions follow: how to extract and model these constraints? and how to design a classification paradigm that leverages these constraints to improve robustness accuracy trade-off? We observe that an ML model is typically a part of a larger system with access to different input modalities. Utilizing these modalities, we introduce invariants that limit the attacker's action space. We design a hierarchical classification paradigm that enforces these invariants at inference time. As a case study, we implement and evaluate our proposal in the context of the real-world application of road sign classification because of its applicability to autonomous driving. With access to different input modalities, such as LiDAR, camera, and location we show how to extract invariants and develop a hierarchical classifier. Our results on the KITTI and GTSRB datasets show that we can improve the robustness against physical attacks at minimal harm to accuracy.
READ FULL TEXT