Improving Test Case Generation for REST APIs Through Hierarchical Clustering

235 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mitchell Olsthoorn

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dimitri Stallenberg - Mitchell Olsthoorn - Annibale Panichella

هندسة البرمجيات الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

With the ever-increasing use of web APIs in modern-day applications, it is becoming more important to test the system as a whole. In the last decade, tools and approaches have been proposed to automate the creation of system-level test cases for these APIs using evolutionary algorithms (EAs). One of the limiting factors of EAs is that the genetic operators (crossover and mutation) are fully randomized, potentially breaking promising patterns in the sequences of API requests discovered during the search. Breaking these patterns has a negative impact on the effectiveness of the test case generation process. To address this limitation, this paper proposes a new approach that uses agglomerative hierarchical clustering (AHC) to infer a linkage tree model, which captures, replicates, and preserves these patterns in new test cases. We evaluate our approach, called LT-MOSA, by performing an empirical study on 7 real-world benchmark applications w.r.t. branch coverage and real-fault detection capability. We also compare LT-MOSA with the two existing state-of-the-art white-box techniques (MIO, MOSA) for REST API testing. Our results show that LT-MOSA achieves a statistically significant increase in test target coverage (i.e., lines and branches) compared to MIO and MOSA in 4 and 5 out of 7 applications, respectively. Furthermore, LT-MOSA discovers 27 and 18 unique real-faults that are left undetected by MIO and MOSA, respectively.

قيم البحث

89 - Erik Wittern , Alan Cha , Jim A. Laredo 2018

GraphQL is a query language and thereupon-based paradigm for implementing web Application Programming Interfaces (APIs) for client-server interactions. Using GraphQL, clients define precise, nested data-requirements in typed queries, which are resolv ed by servers against (possibly multiple) backend systems, like databases, object storages, or other APIs. Clients receive only the data they care about, in a single request. However, providers of existing REST(-like) APIs need to implement additional GraphQL interfaces to enable these advantages. We here assess the feasibility of automatically generating GraphQL wrappers for existing REST(-like) APIs. A wrapper, upon receiving GraphQL queries, translates them to requests against the target API. We discuss the challenges for creating such wrappers, including dealing with data sanitation, authentication, or handling nested queries. We furthermore present a prototypical implementation of OASGraph. OASGraph takes as input an OpenAPI Specification (OAS) describing an existing REST(-like) web API and generates a GraphQL wrapper for it. We evaluate OASGraph by running it, as well as an existing open source alternative, against 959 publicly available OAS. This experiment shows that OASGraph outperforms the existing alternative and is able to create a GraphQL wrapper for 89.5% of the APIs -- however, with limitations in many cases. A subsequent analysis of errors and warnings produced by OASGraph shows that missing or ambiguous information in the assessed OAS hinders creating complete wrappers. Finally, we present a use case of the IBM Watson Language Translator API that shows that small changes to an OAS allow OASGraph to generate more idiomatic and more expressive GraphQL wrappers.

هندسة البرمجيات

Improving Test Distance for Failure Clustering with Hypergraph Modelling

68 - Gabin An , Juyeon Yoon , Joyce Jiyoung Whang 2021

Automated debugging techniques, such as Fault Localisation (FL) or Automated Program Repair (APR), are typically designed under the Single Fault Assumption (SFA). However, in practice, an unknown number of faults can independently cause multiple test case failures, making it difficult to allocate resources for debugging and to use automated debugging techniques. Clustering algorithms have been applied to group the test failures according to their root causes, but their accuracy can often be lacking due to the inherent limits in the distance metrics for test cases. We introduce a new test distance metric based on hypergraphs and evaluate their accuracy using multi-fault benchmarks that we have built on top of Defects4J and SIR. Results show that our technique, Hybiscus, can automatically achieve perfect clustering (i.e., the same number of clusters as the ground truth number of root causes, with all failing tests with the same root cause grouped together) for 418 out of 605 test runs with multiple test failures. Better failure clustering also allows us to separate different root causes and apply FL techniques under SFA, resulting in saving up to 82% of the total wasted effort when compared to the state-of-the-art technique for multiple fault localisation.

هندسة البرمجيات

Requirements-Aided Automatic Test Case Generation for Industrial Cyber-physical Systems

121 - Roopak Sinha , Cheng Pang , Gerardo Santillan Martinez 2021

Industrial cyber-physical systems require complex distributed software to orchestrate many heterogeneous mechatronic components and control multiple physical processes. Industrial automation software is typically developed in a model-driven fashion w here abstractions of physical processes called plant models are co-developed and iteratively refined along with the control code. Testing such multi-dimensional systems is extremely difficult because often models might not be accurate, do not correspond accurately with subsequent refinements, and the software must eventually be tested on the real plant, especially in safety-critical systems like nuclear plants. This paper proposes a framework wherein high-level functional requirements are used to automatically generate test cases for designs at all abstraction levels in the model-driven engineering process. Requirements are initially specified in natural language and then analyzed and specified using a formalized ontology. The requirements ontology is then refined along with controller and plant models during design and development stages such that test cases can be generated automatically at any stage. A representative industrial water process system case study illustrates the strengths of the proposed formalism. The requirements meta-model proposed by the CESAR European project is used for requirements engineering while IEC 61131-3 and model-driven concepts are used in the design and development phases. A tool resulting from the proposed framework called REBATE (Requirements Based Automatic Testing Engine) is used to generate and execute test cases for increasingly concrete controller and plant models.

هندسة البرمجيات

Unit Test Case Generation with Transformers and Focal Context

199 - Michele Tufano , Dawn Drain , Alexey Svyatkovskiy 2020

Automated unit test case generation tools facilitate test-driven development and support developers by suggesting tests intended to identify flaws in their code. Existing approaches are usually guided by the test coverage criteria, generating synthet ic test cases that are often difficult for developers to read or understand. In this paper we propose AthenaTest, an approach that aims to generate unit test cases by learning from real-world focal methods and developer-written testcases. We formulate unit test case generation as a sequence-to-sequence learning task, adopting a two-step training procedure consisting of denoising pretraining on a large unsupervised Java corpus, and supervised finetuning for a downstream translation task of generating unit tests. We investigate the impact of natural language and source code pretraining, as well as the focal context information surrounding the focal method. Both techniques provide improvements in terms of validation loss, with pretraining yielding 25% relative improvement and focal context providing additional 11.1% improvement. We also introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 780K test cases mined from 91K open-source repositories from GitHub. We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts. We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3, finding that our approach outperforms GPT-3 and has comparable coverage w.r.t. EvoSuite. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated tests, showing overwhelmingly preference towards AthenaTest.

هندسة البرمجيات الحساب واللغة التعلم الآلي

Combinatorial Modeling and Test Case Generation for Industrial Control Software using ACTS

119 - Sara Ericsson , Eduard Enoiu 2018

Combinatorial testing has been suggested as an effective method of creating test cases at a lower cost. However, industrially applicable tools for modeling and combinatorial test generation are still scarce. As a direct effect, combinatorial testing has only seen a limited uptake in industry that calls into question its practical usefulness. This lack of evidence is especially troublesome if we consider the use of combinatorial test generation for industrial safety-critical control software, such as are found in trains, airplanes, and power plants. To study the industrial application of combinatorial testing, we evaluated ACTS, a popular tool for combinatorial modeling and test generation, in terms of applicability and test efficiency on industrial-sized IEC 61131-3 industrial control software running on Programmable Logic Controllers (PLC). We assessed ACTS in terms of its direct applicability in combinatorial modeling of IEC 61131-3 industrial software and the efficiency of ACTS in terms of generation time and test suite size. We used 17 industrial control programs provided by Bombardier Transportation Sweden AB and used in a train control management system. Our results show that not all combinations of algorithms and interaction strengths could generate a test suite within a realistic cut-off time. The results of the modeling process and the efficiency evaluation of ACTS are useful for practitioners considering to use combinatorial testing for industrial control software as well as for researchers trying to improve the use of such combinatorial testing techniques.

هندسة البرمجيات