ﻻ يوجد ملخص باللغة العربية
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
Modern software development is increasingly dependent on components, libraries and frameworks coming from third-party vendors or open-source suppliers and made available through a number of platforms (or forges). This way of writing software puts an
Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an
Fault localization is to identify faulty source code. It could be done on various granularities, e.g., classes, methods, and statements. Most of the automated fault localization (AFL) approaches are coarse-grained because it is challenging to accurat
Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called cells
We propose a novel approach to enhance the discriminability of Convolutional Neural Networks (CNN). The key idea is to build a tree structure that could progressively learn fine-grained features to distinguish a subset of classes, by learning feature