Headless Horseman: Adversarial Attacks on Transfer Learning Models
Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these headless attacks. We first demonstrate successful transfer attacks against a victim network using only its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40%.
READ FULL TEXT