EnHMM: On the Use of Ensemble HMMs and Stack Traces to Predict the Reassignment of Bug Report Fields

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wahab Hamou-Lhadj PhD

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Md Shariful Islam - Abdelwahab Hamou-Lhadj - Korosh K. Sabor

هندسة البرمجيات الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Bug reports (BR) contain vital information that can help triaging teams prioritize and assign bugs to developers who will provide the fixes. However, studies have shown that BR fields often contain incorrect information that need to be reassigned, which delays the bug fixing process. There exist approaches for predicting whether a BR field should be reassigned or not. These studies use mainly BR descriptions and traditional machine learning algorithms (SVM, KNN, etc.). As such, they do not fully benefit from the sequential order of information in BR data, such as function call sequences in BR stack traces, which may be valuable for improving the prediction accuracy. In this paper, we propose a novel approach, called EnHMM, for predicting the reassignment of BR fields using ensemble Hidden Markov Models (HMMs), trained on stack traces. EnHMM leverages the natural ability of HMMs to represent sequential data to model the temporal order of function calls in BR stack traces. When applied to Eclipse and Gnome BR repositories, EnHMM achieves an average precision, recall, and F-measure of 54%, 76%, and 60% on Eclipse dataset and 41%, 69%, and 51% on Gnome dataset. We also found that EnHMM improves over the best single HMM by 36% for Eclipse and 76% for Gnome. Finally, when comparing EnHMM to Im.ML.KNN, a recent approach in the field, we found that the average F-measure score of EnHMM improves the average F-measure of Im.ML.KNN by 6.80% and improves the average recall of Im.ML.KNN by 36.09%. However, the average precision of EnHMM is lower than that of Im.ML.KNN (53.93% as opposed to 56.71%).

قيم البحث

158 - Dawn Drain , Colin B. Clement , Guillermo Serrato 2021

The joint task of bug localization and program repair is an integral part of the software development process. In this work we present DeepDebug, an approach to automated debugging using large, pretrained transformers. We begin by training a bug-crea tion model on reversed commit data for the purpose of generating synthetic bugs. We apply these synthetic bugs toward two ends. First, we directly train a backtranslation model on all functions from 200K repositories. Next, we focus on 10K repositories for which we can execute tests, and create bug

هندسة البرمجيات التعلم الآلي

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research

147 - Cody Watson , Nathan Cooper , David Nader Palacio 2020

An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept of Deep Learning (DL). The popularity of such techniques largely stems from their automated featu re engineering capabilities, which aid in modeling software artifacts. However, due to the rapid pace at which DL techniques have been adopted, it is difficult to distill the current successes, failures, and opportunities of the current research landscape. In an effort to bring clarity to this cross-cutting area of work, from its modern inception to the present, this paper presents a systematic literature review of research at the intersection of SE & DL. The review canvases work appearing in the most prominent SE and DL conferences and journals and spans 84 papers across 22 unique SE tasks. We center our analysis around the components of learning, a set of principles that govern the application of machine learning techniques (ML) to a given problem domain, discussing several aspects of the surveyed work at a granular level. The end result of our analysis is a research roadmap that both delineates the foundations of DL techniques applied to SE research, and likely areas of fertile exploration for the future.

هندسة البرمجيات الذكاء الاصطناعي التعلم الآلي

Runtime Verification Based on Executable Models: On-the-Fly Matching of Timed Traces

807 - Mikhail Chupilko 2013

Runtime verification is checking whether a system execution satisfies or violates a given correctness property. A procedure that automatically, and typically on the fly, verifies conformance of the systems behavior to the specified property is called a monitor. Nowadays, a variety of formalisms are used to express properties on observed behavior of computer systems, and a lot of methods have been proposed to construct monitors. However, it is a frequent situation when advanced formalisms and methods are not needed, because an executable model of the system is available. The original purpose and structure of the model are out of importance; rather what is required is that the system and its model have similar sets of interfaces. In this case, monitoring is carried out as follows. Two black boxes, the system and its reference model, are executed in parallel and stimulated with the same input sequences; the monitor dynamically captures their output traces and tries to match them. The main problem is that a model is usually more abstract than the real system, both in terms of functionality and timing. Therefore, trace-to-trace matching is not straightforward and allows the system to produce events in different order or even miss some of them. The paper studies on-the-fly conformance relations for timed systems (i.e., systems whose inputs and outputs are distributed along the time axis). It also suggests a practice-oriented methodology for creating and configuring monitors for timed systems based on executable models. The methodology has been successfully applied to a number of industrial projects of simulation-based hardware verification.

هندسة البرمجيات

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

148 - Kaibo Cao 2021

As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the users intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the users original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of $mathit{ExactMatch}$ and a 4.8% to 14.4% boost in terms of $mathit{GLEU}$.

هندسة البرمجيات الذكاء الاصطناعي استرجاع المعلومات

On the use of test smells for prediction of flaky tests

264 - B. H. P. Camara , M. A. G. Silva , A. T. Endo 2021

Regression testing is an important phase to deliver software with quality. However, flaky tests hamper the evaluation of test results and can increase costs. This is because a flaky test may pass or fail non-deterministically and to identify properly the flakiness of a test requires rerunning the test suite multiple times. To cope with this challenge, approaches have been proposed based on prediction models and machine learning. Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting, presenting low performance when executed in a cross-project scenario. To overcome these limitations, we investigate the use of test smells as predictors of flaky tests. We conducted an empirical study to understand if test smells have good performance as a classifier to predict the flakiness in the cross-project context, and analyzed the information gain of each test smell. We also compared the test smell-based approach with the vocabulary-based one. As a result, we obtained a classifier that had a reasonable performance (Random Forest, 0.83) to predict the flakiness in the testing phase. This classifier presented better performance than vocabulary-based model for cross-project prediction. The Assertion Roulette and Sleepy Test test smell types are the ones associated with the best information gain values.

هندسة البرمجيات التعلم الآلي