Replication studies considered harmful
CONTEXT: There is growing interest in establishing software engineering as an evidence-based discipline. To that end, replication is often used to gain confidence in empirical findings, as opposed to reproduction where the goal is showing the correctness, or validity of the published results. OBJECTIVE: To consider what is required for a replication study to confirm the original experiment and apply this understanding in software engineering. METHOD: Simulation is used to demonstrate why the prediction interval for confirmation can be surprisingly wide. This analysis is applied to three recent replications. RESULTS: It is shown that because the prediction intervals are wide, almost all replications are confirmatory, so in that sense there is no 'replication crisis', however, the contributions to knowledge are negligible. CONCLUSIONS: Replicating empirical software engineering experiments, particularly if they are under-powered or under-reported, is a waste of scientific resources. By contrast, meta-analysis is strongly advocated so that all relevant experiments are combined to estimate the population effect.
READ FULL TEXT