Neuroevolution-Based Generation of Tests and Oracles for Games
Game-like programs have become increasingly popular in many software engineering domains such as mobile apps, web applications, or programming education. However, creating tests for programs that have the purpose of challenging human players is a daunting task for automatic test generators. Even if test generation succeeds in finding a relevant sequence of events to exercise a program, the randomized nature of games means that it may neither be possible to reproduce the exact program behavior underlying this sequence, nor to create test assertions checking if observed randomized game behavior is correct. To overcome these problems, we propose Neatest, a novel test generator based on the NeuroEvolution of Augmenting Topologies (NEAT) algorithm. Neatest systematically explores a program's statements, and creates neural networks that operate the program in order to reliably reach each statement – that is, Neatest learns to play the game in a way to reliably cover different parts of the code. As the networks learn the actual game behavior, they can also serve as test oracles by evaluating how surprising the observed behavior of a program under test is compared to a supposedly correct version of the program. We evaluate this approach in the context of Scratch, an educational programming environment. Our empirical study on 25 non-trivial Scratch games demonstrates that our approach can successfully train neural networks that are not only far more resilient to random influences than traditional test suites consisting of static input sequences, but are also highly effective with an average mutation score of more than 65
READ FULL TEXT