A Systematic Review of Unsupervised Learning Techniques for Software Defect Prediction

70 0 0.0 ( 0 )

Download Cite

Added by Ning Li

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Ning Li - Martin Shepperd - Yuchen Guo

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Background: Unsupervised machine learners have been increasingly applied to software defect prediction. It is an approach that may be valuable for software practitioners because it reduces the need for labeled training data. Objective: Investigate the use and performance of unsupervised learning techniques in software defect prediction. Method: We conducted a systematic literature review that identified 49 studies containing 2456 individual experimental results, which satisfied our inclusion criteria published between January 2000 and March 2018. In order to compare prediction performance across these studies in a consistent way, we (re-)computed the confusion matrices and employed the Matthews Correlation Coefficient (MCC) as our main performance measure. Results: Our meta-analysis shows that unsupervised models are comparable with supervised models for both within-project and cross-project prediction. Among the 14 families of unsupervised model, Fuzzy CMeans (FCM) and Fuzzy SOMs (FSOMs) perform best. In addition, where we were able to check, we found that almost 11% (262/2456) of published results (contained in 16 papers) were internally inconsistent and a further 33% (823/2456) provided insufficient details for us to check. Conclusion: Although many factors impact the performance of a classifier, e.g., dataset characteristics, broadly speaking, unsupervised classifiers do not seem to perform worse than the supervised classifiers in our review. However, we note a worrying prevalence of (i) demonstrably erroneous experimental results, (ii) undemanding benchmarks and (iii) incomplete reporting. We therefore encourage researchers to be comprehensive in their reporting.

rate research

A Systematic Literature Review on Federated Machine Learning: From A Software Engineering Perspective

248 - Sin Kit Lo , Qinghua Lu , Chen Wang 2020

Federated learning is an emerging machine learning paradigm where clients train models locally and formulate a global model based on the local model updates. To identify the state-of-the-art in federated learning and explore how to develop federated learning systems, we perform a systematic literature review from a software engineering perspective, based on 231 primary studies. Our data synthesis covers the lifecycle of federated learning system development that includes background understanding, requirement analysis, architecture design, implementation, and evaluation. We highlight and summarise the findings from the results, and identify future trends to encourage researchers to advance their current work.

Software Engineering Distributed Parallel and Cluster Computing Machine Learning

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research

147 - Cody Watson , Nathan Cooper , David Nader Palacio 2020

An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept of Deep Learning (DL). The popularity of such techniques largely stems from their automated feature engineering capabilities, which aid in modeling software artifacts. However, due to the rapid pace at which DL techniques have been adopted, it is difficult to distill the current successes, failures, and opportunities of the current research landscape. In an effort to bring clarity to this cross-cutting area of work, from its modern inception to the present, this paper presents a systematic literature review of research at the intersection of SE & DL. The review canvases work appearing in the most prominent SE and DL conferences and journals and spans 84 papers across 22 unique SE tasks. We center our analysis around the components of learning, a set of principles that govern the application of machine learning techniques (ML) to a given problem domain, discussing several aspects of the surveyed work at a granular level. The end result of our analysis is a research roadmap that both delineates the foundations of DL techniques applied to SE research, and likely areas of fertile exploration for the future.

Software Engineering Artificial Intelligence Machine Learning

Software Development Analytics in Practice: A Systematic Literature Review

191 - Joao Caldeira , Fernando Brito e Abreu , Jorge Cardoso 2020

Context:Software Development Analytics is a research area concerned with providing insights to improve product deliveries and processes. Many types of studies, data sources and mining methods have been used for that purpose. Objective:This systematic literature review aims at providing an aggregate view of the relevant studies on Software Development Analytics in the past decade (2010-2019), with an emphasis on its application in practical settings. Method:Definition and execution of a search string upon several digital libraries, followed by a quality assessment criteria to identify the most relevant papers. On those, we extracted a set of characteristics (study type, data source, study perspective, development life-cycle activities covered, stakeholders, mining methods, and analytics scope) and classified their impact against a taxonomy. Results:Source code repositories, experimental case studies, and developers are the most common data sources, study types, and stakeholders, respectively. Product and project managers are also often present, but less than expected. Mining methods are evolving rapidly and that is reflected in the long list identified. Descriptive statistics are the most usual method followed by correlation analysis. Being software development an important process in every organization, it was unexpected to find that process mining was present in only one study. Most contributions to the software development life cycle were given in the quality dimension. Time management and costs control were lightly debated. The analysis of security aspects suggests it is an increasing topic of concern for practitioners. Risk management contributions are scarce. Conclusions:There is a wide improvement margin for software development analytics in practice. For instance, mining and analyzing the activities performed by software developers in their actual workbench, the IDE.

Software Engineering

Software Testing Process Models Benefits & Drawbacks: a Systematic Literature Review

188 - Katarina Hrabovska , Bruno Rossi , Tomav{s} Pitner 2019

Context: Software testing plays an essential role in product quality improvement. For this reason, several software testing models have been developed to support organizations. However, adoption of testing process models inside organizations is still sporadic, with a need for more evidence about reported experiences. Aim: Our goal is to identify results gathered from the application of software testing models in organizational contexts. We focus on characteristics such as the context of use, practices applied in different testing process phases, and reported benefits & drawbacks. Method: We performed a Systematic Literature Review (SLR) focused on studies about the application of software testing processes, complemented by results from previous reviews. Results: From 35 primary studies and survey-based articles, we collected 17 testing models. Although most of the existing models are described as applicable to general contexts, the evidence obtained from the studies shows that some models are not suitable for all enterprise sizes, and inadequate for specific domains. Conclusion: The SLR evidence can serve to compare different software testing models for applicability inside organizations. Both benefits and drawbacks, as reported in the surveyed cases, allow getting a better view of the strengths and weaknesses of each model.

Software Engineering

Identification and Measurement of Technical Debt Requirements in Software Development: a Systematic Literature Review

131 - Ana Melo , Roberta Fagundes , Valentina Lenarduzzi 2021

Context: Technical Debt requirements are related to the distance between the ideal value of the specification and the systems actual implementation, which are consequences of strategic decisions for immediate gains, or unintended changes in context. To ensure the evolution of the software, it is necessary to keep it managed. Identification and measurement are the first two stages of the management process; however, they are little explored in academic research in requirements engineering. Objective: We aimed at investigating which evidence helps to strengthen the process of TD requirements management, including identification and measurement. Method: We conducted a Systematic Literature Review through manual and automatic searches considering 7499 studies from 2010 to 2020, and including 61 primary studies. Results: We identified some causes related to Technical Debt requirements, existing strategies to help in the identification and measurement, and metrics to support the measurement stage. Conclusion: Studies on TD requirements are still preliminary, especially on management tools. Yet, not enough attention is given to interpersonal issues, which are difficulties encountered when performing such activities, and therefore also require research. Finally, the provision of metrics to help measure TD is part of this works contribution, providing insights into the application in the requirements context.

Software Engineering