No Arabic abstract
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.
Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities.
Testing processes and workflows in information and Internet of Things systems is a major part of the typical software testing effort. Consistent and efficient path-based test cases are desired to support these tests. Because certain parts of software system workflows have a higher business priority than others, this fact has to be involved in the generation of test cases. In this paper, we propose a Prioritized Process Test (PPT), which is a model-based test case generation algorithm that represents an alternative to currently established algorithms that use directed graphs and test requirements to model the system under test. The PPT accepts a directed multigraph as a model to express priorities, and edge weights are used instead of test requirements. To determine the test-coverage level of test cases, a test-depth-level concept is used. We compared the presented PPT with five alternatives (i.e., the Process Cycle Test, a naive reduction of test set created by the Process Cycle Test, Brute Force algorithm, Set-covering Based Solution and Matching-based Prefix Graph Solution) for edge coverage and edge-pair coverage. To assess the optimality of the path-based test cases produced by these strategies, we used fourteen metrics based on the properties of these test cases and 59 models that were created for three real-world systems. For all edge coverage, the PPT produced more optimal test cases than the alternatives in terms of the majority of the metrics. For edge-pair coverage, the PPT strategy yielded similar results to those of the alternatives. Thus, the PPT strategy is an applicable alternative, as it reflects both the required test coverage level and the business priority in parallel.
In this paper, we propose to use production executions to improve the quality of testing for certain methods of interest for developers. These methods can be methods that are not covered by the existing test suite, or methods that are poorly tested. We devise an approach called PANKTI which monitors applications as they execute in production, and then automatically generates differential unit tests, as well as derived oracles, from the collected data. PANKTIs monitoring and generation focuses on one single programming language, Java. We evaluate it on three real-world, open-source projects: a videoconferencing system, a PDF manipulation library, and an e-commerce application. We show that PANKTI is able to generate differential unit tests by monitoring target methods in production, and that the generated tests improve the quality of the test suite of the application under consideration.
Growth of software size, lack of resources to perform regression testing, and failure to detect bugs faster have seen increased reliance on continuous integration and test automation. Even with greater hardware and software resources dedicated to test automation, software testing is faced with enormous challenges, resulting in increased dependence on complex mechanisms for automated test case selection and prioritization as part of a continuous integration framework. These mechanisms are currently using simple entities called test cases that are concretely realized as executable scripts. Our key idea is to provide test cases with more reasoning, adaptive behavior and learning capabilities by using the concepts of intelligent software agents. We refer to such test cases as test agents. The model that underlie a test agent is capable of flexible and autonomous actions in order to meet overall testing objectives. Our goal is to increase the decentralization of regression testing by letting test agents to know for themselves when they should be executing, how they should update their purpose, and when they should interact with each other. In this paper, we envision software test agents that display such adaptive autonomous behavior. Emerging developments and challenges regarding the use of test agents are explored-in particular, new research that seeks to use adaptive autonomous agents in software testing.
It is integral to test API functions of widely used deep learning (DL) libraries. The effectiveness of such testing requires DL specific input constraints of these API functions. Such constraints enable the generation of valid inputs, i.e., inputs that follow these DL specific constraints, to explore deep to test the core functionality of API functions. Existing fuzzers have no knowledge of such constraints, and existing constraint extraction techniques are ineffective for extracting DL specific input constraints. To fill this gap, we design and implement a document guided fuzzing technique, D2C, for API functions of DL libraries. D2C leverages sequential pattern mining to generate rules for extracting DL specific constraints from API documents and uses these constraints to guide the fuzzing to generate valid inputs automatically. D2C also generates inputs that violate these constraints to test the input validity checking code. In addition, D2C uses the constraints to generate boundary inputs to detect more bugs. Our evaluation of three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that D2Cs accuracy in extracting input constraints is 83.3% to 90.0%. D2C detects 121 bugs, while a baseline fuzzer without input constraints detects only 68 bugs. Most (89) of the 121 bugs are previously unknown, 54 of which have been fixed or confirmed by developers after we report them. In addition, D2C detects 38 inconsistencies within documents, including 28 that are fixed or confirmed after we report them.