A Procedure and Guidelines for Analyzing Groups of Software Engineering Replications
Context: Researchers from different groups and institutions are collaborating on building groups of experiments by means of replication (i.e., conducting groups of replications). Disparate aggregation techniques are being applied to analyze groups of replications. The application of unsuitable techniques to aggregate replication results may undermine the potential of groups of replications to provide in-depth insights from experiment results. Objectives: Provide an analysis procedure with a set of embedded guidelines to aggregate software engineering (SE) replication results. Method: We compare the characteristics of groups of replications for SE and other mature experimental disciplines such as medicine and pharmacology. In view of their differences, the limitations with regard to the joint data analysis of groups of SE replications and the guidelines provided in mature experimental disciplines to analyze groups of replications, we build an analysis procedure with a set of embedded guidelines specifically tailored to the analysis of groups of SE replications. We apply the proposed analysis procedure to a representative group of SE replications to illustrate its use. Results: All the information contained within the raw data should be leveraged during the aggregation of replication results. The analysis procedure that we propose encourages the use of stratified individual participant data and aggregated data in tandem to analyze groups of SE replications. Conclusion: The aggregation techniques used to analyze groups of replications should be justified in research articles. This will increase the reliability and transparency of joint results. The proposed guidelines should ease this endeavor.
READ FULL TEXT