ترغب بنشر مسار تعليمي؟ اضغط هنا

On the Time-Based Conclusion Stability of Cross-Project Defect Prediction Models

51   0   0.0 ( 0 )
 نشر من قبل Abdul Ali Bangash
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researchers conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the area of Software Analytics, conclusions over different data sets are usually inconsistent. In this article, we empirically investigate whether conclusions in the area of defect prediction truly exhibit stability throughout time or not. Our investigation applies a time-aware evaluation approach where models are trained only on the past, and evaluations are executed only on the future. Through this time-aware evaluation, we show that depending on which time period we evaluate defect predictors, their performance, in terms of F-Score, the area under the curve (AUC), and Mathews Correlation Coefficient (MCC), varies and their results are not consistent. The next release of a product, which is significantly different from its prior release, may drastically change defect prediction performance. Therefore, without knowing about the conclusion stability, empirical software engineering researchers should limit their claims of performance within the contexts of evaluation, because broad claims about defect prediction performance might be contradicted by the next upcoming release of a product under analysis.



قيم البحث

اقرأ أيضاً

Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application o f machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this paper we demonstrate a simple example of how design mining works. We then show how conclusion stability is poor on different artifact types and different projects. We show two techniques -- augmentation and context specificity -- that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC of 0.88 on within dataset classification and 0.80 on the cross-dataset classification task.
Background: Unsupervised machine learners have been increasingly applied to software defect prediction. It is an approach that may be valuable for software practitioners because it reduces the need for labeled training data. Objective: Investigate th e use and performance of unsupervised learning techniques in software defect prediction. Method: We conducted a systematic literature review that identified 49 studies containing 2456 individual experimental results, which satisfied our inclusion criteria published between January 2000 and March 2018. In order to compare prediction performance across these studies in a consistent way, we (re-)computed the confusion matrices and employed the Matthews Correlation Coefficient (MCC) as our main performance measure. Results: Our meta-analysis shows that unsupervised models are comparable with supervised models for both within-project and cross-project prediction. Among the 14 families of unsupervised model, Fuzzy CMeans (FCM) and Fuzzy SOMs (FSOMs) perform best. In addition, where we were able to check, we found that almost 11% (262/2456) of published results (contained in 16 papers) were internally inconsistent and a further 33% (823/2456) provided insufficient details for us to check. Conclusion: Although many factors impact the performance of a classifier, e.g., dataset characteristics, broadly speaking, unsupervised classifiers do not seem to perform worse than the supervised classifiers in our review. However, we note a worrying prevalence of (i) demonstrably erroneous experimental results, (ii) undemanding benchmarks and (iii) incomplete reporting. We therefore encourage researchers to be comprehensive in their reporting.
129 - Fotios Gioulekas 2018
Existing model-based processes for embedded real-time systems support the analysis of various non-functional properties, most notably schedulability, through model checking, simulation or other means. The analysis results are then used for modifying the systems design, so that the expected properties are satisfied. A rigorous model-based design flow differs in that it aims at a system implementation derived from high-level models by applying a sequence of semantics-preserving transformations. Properties established at any design step are preserved throughout the subsequent steps including the executable implementation. We introduce such a design flow using a process network model of computation for application design at a high level, which combines streaming and reactive control processing with task parallelism. The schedulability of the so-called FPPNs (Fixed Priority Process Networks) is well-studied and various solutions have been presented. This article focuses on the design flows steps for deriving executable implementations on the BIP (Behavior - Interaction - Priority) runtime environment. FPPNs are designed using the TASTE toolset, a convenient architecture description interface. In this way, the developers do not program explicitly low-level real-time OS services and the schedulability properties are guaranteed throughout the design steps by construction. The approach has been validated on the design of a real spacecraft on-board application that has been scheduled for execution on an industrial multicore platform.
79 - Kyle E. Niemeyer 2019
This paper describes the motivation and design of a 10-week graduate course that teaches practices for developing research software; although offered by an engineering program, the content applies broadly to any field of scientific research where sof tware may be developed. Topics taught in the course include local and remote version control, licensing and copyright, structuring Python modules, testing and test coverage, continuous integration, packaging and distribution, open science, software citation, and reproducibility basics, among others. Lectures are supplemented by in-class activities and discussions, and all course material is shared openly via GitHub. Coursework is heavily based on a single, term-long project where students individually develop a software package targeted at their own research topic; all contributions must be submitted as pull requests and reviewed/merged by other students. The course was initially offered in Spring 2018 with 17 students enrolled, and will be taught again in Spring 2019.
770 - Mikhail Chupilko 2013
Runtime verification is checking whether a system execution satisfies or violates a given correctness property. A procedure that automatically, and typically on the fly, verifies conformance of the systems behavior to the specified property is called a monitor. Nowadays, a variety of formalisms are used to express properties on observed behavior of computer systems, and a lot of methods have been proposed to construct monitors. However, it is a frequent situation when advanced formalisms and methods are not needed, because an executable model of the system is available. The original purpose and structure of the model are out of importance; rather what is required is that the system and its model have similar sets of interfaces. In this case, monitoring is carried out as follows. Two black boxes, the system and its reference model, are executed in parallel and stimulated with the same input sequences; the monitor dynamically captures their output traces and tries to match them. The main problem is that a model is usually more abstract than the real system, both in terms of functionality and timing. Therefore, trace-to-trace matching is not straightforward and allows the system to produce events in different order or even miss some of them. The paper studies on-the-fly conformance relations for timed systems (i.e., systems whose inputs and outputs are distributed along the time axis). It also suggests a practice-oriented methodology for creating and configuring monitors for timed systems based on executable models. The methodology has been successfully applied to a number of industrial projects of simulation-based hardware verification.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا