Improving Test Distance for Failure Clustering with Hypergraph Modelling

69 0 0.0 ( 0 )

Download Cite

Added by Gabin An

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Gabin An - Juyeon Yoon - Joyce Jiyoung Whang

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Automated debugging techniques, such as Fault Localisation (FL) or Automated Program Repair (APR), are typically designed under the Single Fault Assumption (SFA). However, in practice, an unknown number of faults can independently cause multiple test case failures, making it difficult to allocate resources for debugging and to use automated debugging techniques. Clustering algorithms have been applied to group the test failures according to their root causes, but their accuracy can often be lacking due to the inherent limits in the distance metrics for test cases. We introduce a new test distance metric based on hypergraphs and evaluate their accuracy using multi-fault benchmarks that we have built on top of Defects4J and SIR. Results show that our technique, Hybiscus, can automatically achieve perfect clustering (i.e., the same number of clusters as the ground truth number of root causes, with all failing tests with the same root cause grouped together) for 418 out of 605 test runs with multiple test failures. Better failure clustering also allows us to separate different root causes and apply FL techniques under SFA, resulting in saving up to 82% of the total wasted effort when compared to the state-of-the-art technique for multiple fault localisation.

rate research

Improving Test Case Generation for REST APIs Through Hierarchical Clustering

234 - Dimitri Stallenberg , Mitchell Olsthoorn , Annibale Panichella 2021

With the ever-increasing use of web APIs in modern-day applications, it is becoming more important to test the system as a whole. In the last decade, tools and approaches have been proposed to automate the creation of system-level test cases for these APIs using evolutionary algorithms (EAs). One of the limiting factors of EAs is that the genetic operators (crossover and mutation) are fully randomized, potentially breaking promising patterns in the sequences of API requests discovered during the search. Breaking these patterns has a negative impact on the effectiveness of the test case generation process. To address this limitation, this paper proposes a new approach that uses agglomerative hierarchical clustering (AHC) to infer a linkage tree model, which captures, replicates, and preserves these patterns in new test cases. We evaluate our approach, called LT-MOSA, by performing an empirical study on 7 real-world benchmark applications w.r.t. branch coverage and real-fault detection capability. We also compare LT-MOSA with the two existing state-of-the-art white-box techniques (MIO, MOSA) for REST API testing. Our results show that LT-MOSA achieves a statistically significant increase in test target coverage (i.e., lines and branches) compared to MIO and MOSA in 4 and 5 out of 7 applications, respectively. Furthermore, LT-MOSA discovers 27 and 18 unique real-faults that are left undetected by MIO and MOSA, respectively.

Software Engineering Artificial Intelligence

Identification of Failure Regions for Programs with Numeric Inputs

87 - Rubing Huang , Weifeng Sun , Tsong Yueh Chen 2020

Failure region, where failure-causing inputs reside, has provided many insights to enhance testing effectiveness of many testing methods. Failure region may also provide some important information to support other processes such as software debugging. When a testing method detects a software failure, indicating that a failure-causing input is identified, the next important question is about how to identify the failure region based on this failure-causing input, i.e., Identification of Failure Regions (IFR). In this paper, we introduce a new IFR strategy, namely Search for Boundary (SB), to identify an approximate failure region of a numeric input domain. SB attempts to identify additional failure-causing inputs that are as close to the boundary of the failure region as possible. To support SB, we provide a basic procedure, and then propose two methods, namely Fixed-orientation Search for Boundary (FSB) and Diverse-orientation Search for Boundary (DSB). In addition, we implemented an automated experimentation platform to integrate these methods. In the experiments, we evaluated the proposed SB methods using a series of simulation studies andempirical studies with different types of failure regions. The results show that our methods can effectively identify a failure region, within the limited testing resources.

Software Engineering Signal Processing

HighwayGraph: Modelling Long-distance Node Relations for Improving General Graph Neural Network

207 - Deli Chen , Xiaoqian Liu , Yankai Lin 2019

Graph Neural Networks (GNNs) are efficient approaches to process graph-structured data. Modelling long-distance node relations is essential for GNN training and applications. However, conventional GNNs suffer from bad performance in modelling long-distance node relations due to limited-layer information propagation. Existing studies focus on building deep GNN architectures, which face the over-smoothing issue and cannot model node relations in particularly long distance. To address this issue, we propose to model long-distance node relations by simply relying on shallow GNN architectures with two solutions: (1) Implicitly modelling by learning to predict node pair relations (2) Explicitly modelling by adding edges between nodes that potentially have the same label. To combine our two solutions, we propose a model-agnostic training framework named HighwayGraph, which overcomes the challenge of insufficient labeled nodes by sampling node pairs from the training set and adopting the self-training method. Extensive experimental results show that our HighwayGraph achieves consistent and significant improvements over four representative GNNs on three benchmark datasets.

Machine Learning Computation and Language Social and Information Networks

Improving Linux-Kernel Tests for LockDoc with Feedback-driven Fuzzing

74 - Alexander Lochmann , Robin Thunig , Horst Schirmeier 2020

LockDoc is an approach to extract locking rules for kernel data structures from a dynamic execution trace recorded while the system is under a benchmark load. These locking rules can e.g. be used to locate synchronization bugs. For high rule precision and thorough bug finding, the approach heavily depends on the choice of benchmarks: They must trigger the execution of as much code as possible in the kernel subsystem relevant for the targeted data structures. However, existing test suites such as those provided by the Linux Test Project (LTP) only achieve -- in the case of LTP -- about 35 percent basic-block coverage for the VFS subsystem, which is the relevant subsystem when extracting locking rules for filesystem-related data structures. In this article, we discuss how to complement the LTP suites to improve the code coverage for our LockDoc scenario. We repurpose syzkaller -- a coverage-guided fuzzer with the goal to validate the robustness of kernel APIs -- to 1) not aim for kernel crashes, and to 2) maximize code coverage for a specific kernel subsystem. Thereby, we generate new benchmark programs that can be run in addition to the LTP, and increase VFS basic-block coverage by 26.1 percent.

Software Engineering

Improving Runtime Overheads for detectEr

357 - Ian Cassar 2015

We design monitor optimisations for detectEr, a runtime-verification tool synthesising systems of concurrent monitors from correctness properties for Erlang programs. We implement these optimisations as part of the existing tool and show that they yield considerably lower runtime overheads when compared to the unoptimised monitor synthesis.

Software Engineering