No Arabic abstract
Ensuring correctness of cyber-physical systems (CPS) is an extremely challenging task that is in practice often addressed with simulation based testing. Formal specification languages, such as Signal Temporal Logic (STL), are used to mathematically express CPS requirements and thus render the simulation activity more systematic and principled. We propose a novel method for adaptive generation of tests with specification coverage for STL. To achieve this goal, we devise cooperative reachability games that we combine with numerical optimization to create tests that explore the system in a way that exercise various parts of the specification. To the best of our knowledge our approach is the first adaptive testing approach that can be applied directly to MATLABtexttrademark; Simulink/Stateflow models. We implemented our approach in a prototype tool and evaluated it on several illustrating examples and a case study from the avionics domain, demonstrating the effectiveness of adaptive testing to (1) incrementally build a test case that reaches a test objective, (2) generate a test suite that increases the specification coverage, and (3) infer what part of the specification is actually implemented.
The widespread recognition of the smart contracts has established their importance in the landscape of next generation blockchain technology. However, writing a correct smart contract is notoriously difficult. Moreover, once a state-changing transaction is confirmed by the network, the result is immutable. For this reason, it is crucial to perform a thorough testing of a smart contract application before its deployment. This papers focus is on the test coverage criteria for smart contracts, which are objective rules that measure test quality. We analyze the unique characteristics of the Ethereum smart contract program model as compared to the conventional program model. To capture essential control flow behaviors of smart contracts, we propose the notions of whole transaction basis path set and bounded transaction interaction. The former is a limited set of linearly independent inter-procedural paths from which the potentially infinite paths of Ethereum transactions can be constructed by linear combination, while the latter is the permutations of transactions within a certain bound. Based on these two notions, we define a family of path-based test coverage criteria. Algorithms are given to the generation of coverage requirements. A case study is conducted to compare the effectiveness of the proposed test coverage criteria with random testing and statement coverage testing.
Modern programming follows the continuous integration (CI) and continuous deployment (CD) approach rather than the traditional waterfall model. Even the development of modern programming languages uses the CI/CD approach to swiftly provide new language features and to adapt to new development environments. Unlike in the conventional approach, in the modern CI/CD approach, a language specification is no more the oracle of the language semantics because both the specification and its implementations can co-evolve. In this setting, both the specification and implementations may have bugs, and guaranteeing their correctness is non-trivial. In this paper, we propose a novel N+1-version differential testing to resolve the problem. Unlike the traditional differential testing, our approach consists of three steps: 1) to automatically synthesize programs guided by the syntax and semantics from a given language specification, 2) to generate conformance tests by injecting assertions to the synthesized programs to check their final program states, 3) to detect bugs in the specification and implementations via executing the conformance tests on multiple implementations, and 4) to localize bugs on the specification using statistical information. We actualize our approach for the JavaScript programming language via JEST, which performs N+1-version differential testing for modern JavaScript engines and ECMAScript, the language specification describing the syntax and semantics of JavaScript in a natural language. We evaluated JEST with four JavaScript engines that support all modern JavaScript language features and the latest version of ECMAScript (ES11, 2020). JEST automatically synthesized 1,700 programs that covered 97.78% of syntax and 87.70% of semantics from ES11. Using the assertion-injection, it detected 44 engine bugs in four engines and 27 specification bugs in ES11.
Random testing (RT) is a well-studied testing method that has been widely applied to the testing of many applications, including embedded software systems, SQL database systems, and Android applications. Adaptive random testing (ART) aims to enhance RTs failure-detection ability by more evenly spreading the test cases over the input domain. Since its introduction in 2001, there have been many contributions to the development of ART, including various approaches, implementations, assessment and evaluation methods, and applications. This paper provides a comprehensive survey on ART, classifying techniques, summarizing application areas, and analyzing experimental evaluations. This paper also addresses some misconceptions about ART, and identifies open research challenges to be further investigated in the future work.
Researchers in the humanities are among the many who are now exploring the world of big data. They have begun to use programming languages like Python or R and their corresponding libraries to manipulate large data sets and discover brand new insights. One of the major hurdles that still exists is incorporating visualizations of this data into their projects. Visualization libraries can be difficult to learn how to use, even for those with formal training. Yet these visualizations are crucial for recognizing themes and communicating results to not only other researchers, but also the general public. This paper focuses on producing meaningful visualizations of data using machine learning. We allow the user to visually specify their code requirements in order to lower the barrier for humanities researchers to learn how to program visualizations. We use a hybrid model, combining a neural network and optical character recognition to generate the code to create the visualization.
Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We then propose BATS, an unsupervised learning-based system to predict patch correctness by checking patch Behaviour Against failing Test Specification. BATS exploits deep representation learning models for code and patches: for a given failing test case, the yielded embedding is used to compute similarity metrics in the search for historical similar test cases in order to identify the associated applied patches, which are then used as a proxy for assessing generated patch correctness. Experimentally, we first validate our hypothesis by assessing whether ground-truth developer patches cluster together in the same way that their associated failing test cases are clustered. Then, after collecting a large dataset of 1278 plausible patches (written by developers or generated by some 32 APR tools), we use BATS to predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall between 0.562 and 0.854 in identifying correct patches. Compared against previous work, we demonstrate that our approach outperforms state-of-the-art performance in patch correctness prediction, without the need for large labeled patch datasets in contrast with prior machine learning-based approaches. While BATS is constrained by the availability of similar test cases, we show that it can still be complementary to existing approaches: used in conjunction with a recent approach implementing supervised learning, BATS improves the overall recall in detecting correct patches. We finally show that BATS can be complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying the correct patches for APR tools.