No Arabic abstract
Combinatorial testing has been suggested as an effective method of creating test cases at a lower cost. However, industrially applicable tools for modeling and combinatorial test generation are still scarce. As a direct effect, combinatorial testing has only seen a limited uptake in industry that calls into question its practical usefulness. This lack of evidence is especially troublesome if we consider the use of combinatorial test generation for industrial safety-critical control software, such as are found in trains, airplanes, and power plants. To study the industrial application of combinatorial testing, we evaluated ACTS, a popular tool for combinatorial modeling and test generation, in terms of applicability and test efficiency on industrial-sized IEC 61131-3 industrial control software running on Programmable Logic Controllers (PLC). We assessed ACTS in terms of its direct applicability in combinatorial modeling of IEC 61131-3 industrial software and the efficiency of ACTS in terms of generation time and test suite size. We used 17 industrial control programs provided by Bombardier Transportation Sweden AB and used in a train control management system. Our results show that not all combinations of algorithms and interaction strengths could generate a test suite within a realistic cut-off time. The results of the modeling process and the efficiency evaluation of ACTS are useful for practitioners considering to use combinatorial testing for industrial control software as well as for researchers trying to improve the use of such combinatorial testing techniques.
Industrial cyber-physical systems require complex distributed software to orchestrate many heterogeneous mechatronic components and control multiple physical processes. Industrial automation software is typically developed in a model-driven fashion where abstractions of physical processes called plant models are co-developed and iteratively refined along with the control code. Testing such multi-dimensional systems is extremely difficult because often models might not be accurate, do not correspond accurately with subsequent refinements, and the software must eventually be tested on the real plant, especially in safety-critical systems like nuclear plants. This paper proposes a framework wherein high-level functional requirements are used to automatically generate test cases for designs at all abstraction levels in the model-driven engineering process. Requirements are initially specified in natural language and then analyzed and specified using a formalized ontology. The requirements ontology is then refined along with controller and plant models during design and development stages such that test cases can be generated automatically at any stage. A representative industrial water process system case study illustrates the strengths of the proposed formalism. The requirements meta-model proposed by the CESAR European project is used for requirements engineering while IEC 61131-3 and model-driven concepts are used in the design and development phases. A tool resulting from the proposed framework called REBATE (Requirements Based Automatic Testing Engine) is used to generate and execute test cases for increasingly concrete controller and plant models.
Context: Regression testing activities greatly reduce the risk of faulty software release. However, the size of the test suites grows throughout the development process, resulting in time-consuming execution of the test suite and delayed feedback to the software development team. This has urged the need for approaches such as test case prioritization (TCP) and test-suite reduction to reach better results in case of limited resources. In this regard, proposing approaches that use auxiliary sources of data such as bug history can be interesting. Objective: Our aim is to propose an approach for TCP that takes into account test case coverage data, bug history, and test case diversification. To evaluate this approach we study its performance on real-world open-source projects. Method: The bug history is used to estimate the fault-proneness of source code areas. The diversification of test cases is preserved by incorporating fault-proneness on a clustering-based approach scheme. Results: The proposed methods are evaluated on datasets collected from the development history of five real-world projects including 3
Theorem provers has been used extensively in software engineering for software testing or verification. However, software is now so large and complex that additional architecture is needed to guide theorem provers as they try to generate test suites. The SNAP test suite generator (introduced in this paper) combines the Z3 theorem prover with the following tactic: cluster some candidate tests, then search for valid tests by proposing small mutations to the cluster centroids. This technique effectively removes repeated structures in the tests since many repeated structures can be replaced with one centroid. In practice, SNAP is remarkably effective. For 27 real-world programs with up to half a million variables, SNAP found test suites which were 10 to 750 smaller times than those found by the prior state-of-the-art. Also, SNAP ran orders of magnitude faster and (unlike prior work) generated 100% valid tests.
Automated unit test case generation tools facilitate test-driven development and support developers by suggesting tests intended to identify flaws in their code. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult for developers to read or understand. In this paper we propose AthenaTest, an approach that aims to generate unit test cases by learning from real-world focal methods and developer-written testcases. We formulate unit test case generation as a sequence-to-sequence learning task, adopting a two-step training procedure consisting of denoising pretraining on a large unsupervised Java corpus, and supervised finetuning for a downstream translation task of generating unit tests. We investigate the impact of natural language and source code pretraining, as well as the focal context information surrounding the focal method. Both techniques provide improvements in terms of validation loss, with pretraining yielding 25% relative improvement and focal context providing additional 11.1% improvement. We also introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 780K test cases mined from 91K open-source repositories from GitHub. We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts. We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3, finding that our approach outperforms GPT-3 and has comparable coverage w.r.t. EvoSuite. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated tests, showing overwhelmingly preference towards AthenaTest.
With the ever-increasing use of web APIs in modern-day applications, it is becoming more important to test the system as a whole. In the last decade, tools and approaches have been proposed to automate the creation of system-level test cases for these APIs using evolutionary algorithms (EAs). One of the limiting factors of EAs is that the genetic operators (crossover and mutation) are fully randomized, potentially breaking promising patterns in the sequences of API requests discovered during the search. Breaking these patterns has a negative impact on the effectiveness of the test case generation process. To address this limitation, this paper proposes a new approach that uses agglomerative hierarchical clustering (AHC) to infer a linkage tree model, which captures, replicates, and preserves these patterns in new test cases. We evaluate our approach, called LT-MOSA, by performing an empirical study on 7 real-world benchmark applications w.r.t. branch coverage and real-fault detection capability. We also compare LT-MOSA with the two existing state-of-the-art white-box techniques (MIO, MOSA) for REST API testing. Our results show that LT-MOSA achieves a statistically significant increase in test target coverage (i.e., lines and branches) compared to MIO and MOSA in 4 and 5 out of 7 applications, respectively. Furthermore, LT-MOSA discovers 27 and 18 unique real-faults that are left undetected by MIO and MOSA, respectively.