Test cases as a measurement instrument in experimentation
Background: Test suites are frequently used to quantify relevant software attributes, such as quality or productivity. Problem: We have detected that the same response variable, measured using different test suites, yields different experiment results. Aims: Assess to which extent differences in test case construction influence measurement accuracy and experimental outcomes. Method: Two industry experiments have been measured using two different test suites, one generated using an ad-hoc method and another using equivalence partitioning. The accuracy of the measures has been studied using standard procedures, such as ISO 5725, Bland-Altman and Interclass Correlation Coefficients. Results: There are differences in the values of the response variables up to +-60 partitioning) used. Conclusions: The disclosure of datasets and analysis code is insufficient to ensure the reproducibility of SE experiments. Experimenters should disclose all experimental materials needed to perform independent measurement and re-analysis.
READ FULL TEXT