Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms
The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the closeness of natural languages to the high-level programming language such as C/C++, this work studies how good are the large transformer-based language models detecting software vulnerabilities. Our results demonstrate the well performance of these models on software vulnerability detection. The answer enables extending transformer-based language models to vulnerability detection and leveraging superior performance beyond the natural language processing domain. Besides, we perform the model's security check using Microsoft's Counterfit, a command-line tool to assess the model's security. Our results find that these models are vulnerable to adversarial examples. In this regard, we present a simple countermeasure and its result. Experimenting with large models is always a challenge due to the requirement of computing resources and platforms/libraries dependencies. Based on the experiences and difficulties we faced during this work, we present our recommendation while choosing the platforms to run these large models. Moreover, the popular platforms are surveyed thoroughly in this paper.
READ FULL TEXT