Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models

by   Jiawei Liu, et al.

Neural text ranking models have witnessed significant advancement and are increasingly being deployed in practice. Unfortunately, they also inherit adversarial vulnerabilities of general neural models, which have been detected but remain underexplored by prior studies. Moreover, the inherit adversarial vulnerabilities might be leveraged by blackhat SEO to defeat better-protected search engines. In this study, we propose an imitation adversarial attack on black-box neural passage ranking models. We first show that the target passage ranking model can be transparentized and imitated by enumerating critical queries/candidates and then train a ranking imitation model. Leveraging the ranking imitation model, we can elaborately manipulate the ranking results and transfer the manipulation attack to the target ranking model. For this purpose, we propose an innovative gradient-based attack method, empowered by the pairwise objective function, to generate adversarial triggers, which causes premeditated disorderliness with very few tokens. To equip the trigger camouflages, we add the next sentence prediction loss and the language model fluency constraint to the objective function. Experimental results on passage ranking demonstrate the effectiveness of the ranking imitation attack model and adversarial triggers against various SOTA neural ranking models. Furthermore, various mitigation analyses and human evaluation show the effectiveness of camouflages when facing potential mitigation approaches. To motivate other scholars to further investigate this novel and important problem, we make the experiment data and code publicly available.


page 1

page 2

page 3

page 4


Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models

Neural ranking models (NRMs) have attracted considerable attention in in...

Towards Imperceptible Document Manipulations against Neural Ranking Models

Adversarial attacks have gained traction in order to identify potential ...

Practical Relative Order Attack in Deep Ranking

Recent studies unveil the vulnerabilities of deep ranking models, where ...

Imitation Attacks and Defenses for Black-box Machine Translation Systems

We consider an adversary looking to steal or attack a black-box machine ...

Improved and Efficient Text Adversarial Attacks using Target Information

There has been recently a growing interest in studying adversarial examp...

On the Feasibility of Specialized Ability Extracting for Large Language Code Models

Recent progress in large language code models (LLCMs) has led to a drama...

Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles

In this paper, we focus on a novel optimization problem in which the obj...

Please sign up or login with your details

Forgot password? Click here to reset