Large-Scale Manual Validation of Bug Fixing Commits: A Fine-grained Analysis of Tangling

11/12/2020
by   Steffen Herbold, et al.
0

Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

An exploratory study of bug-introducing changes: what happens when bugs are introduced in open source software?

Context: Many studies consider the relation between individual aspects a...
research
12/08/2022

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Software bugs claim approximately 50 economy billions of dollars. Once a...
research
03/28/2021

Watch out for Extrinsic Bugs! A Case Study of their Impact in Just-In-Time Bug Prediction Models on the OpenStack project

Intrinsic bugs are bugs for which a bug introducing change can be identi...
research
03/06/2021

We'll Fix It in Post: What Do Bug Fixes in Video Game Update Notes Tell Us?

Bugs that persist into releases of video games can have negative impacts...
research
11/20/2019

Issues with SZZ: An empirical assessment of the state of practice of defect prediction data collection

Defect prediction research has a strong reliance on published data sets ...
research
10/07/2022

Understanding and Supporting Debugging Workflows in Multiverse Analysis

Multiverse analysis-a paradigm for statistical analysis that considers a...
research
10/13/2019

A multi-label, dual-output deep neural network for automated bug triaging

Bug tracking enables the monitoring and resolution of issues and bugs wi...

Please sign up or login with your details

Forgot password? Click here to reset