No Arabic abstract
In this paper, we propose to use production executions to improve the quality of testing for certain methods of interest for developers. These methods can be methods that are not covered by the existing test suite, or methods that are poorly tested. We devise an approach called PANKTI which monitors applications as they execute in production, and then automatically generates differential unit tests, as well as derived oracles, from the collected data. PANKTIs monitoring and generation focuses on one single programming language, Java. We evaluate it on three real-world, open-source projects: a videoconferencing system, a PDF manipulation library, and an e-commerce application. We show that PANKTI is able to generate differential unit tests by monitoring target methods in production, and that the generated tests improve the quality of the test suite of the application under consideration.
A major challenge in testing software product lines is efficiency. In particular, testing a product line should take less effort than testing each and every product individually. We address this issue in the context of input-output conformance testing, which is a formal theory of model-based testing. We extend the notion of conformance testing on input-output featured transition systems with the novel concept of spinal test suites. We show how this concept dispenses with retesting the common behavior among different, but similar, products of a software product line.
Nowadays, ensuring the quality becomes challenging for most modern software systems when constraints are given for the combinations of configurations. Combinatorial interaction strategies can systematically reduce the number of test cases to construct a minimal test suite without affecting the effectiveness of the tests. This paper presents a new efficient search-based strategy to generate constrained interaction test suites to cover all possible combinations. The paper also shows a new application of constrained interaction testing in software fault searches. The proposed strategy initially generates the set of all possible t-tuple combinations; then, it filters out the set by removing the forbidden t-tuples using the base forbidden tuple (BFT) approach. The strategy also utilizes a mixed neighborhood tabu search (TS) to construct optimal or near-optimal constrained test suites. The efficiency of the proposed method is evaluated through a comparison against two well-known state-of-the-art tools. The evaluation consists of three sets of experiments for 35 standard benchmarks. Additionally, the effectiveness and quality of the results are assessed using a real-world case study. Experimental results show that the proposed strategy outperforms one of the competitive strategies, ACTS, for approximately 83% of the benchmarks and achieves similar results to CASA for 65% of the benchmarks when the interaction strength is 2. For an interaction strength of 3, the proposed method outperforms other competitive strategies for approximately 60% and 42% of the benchmarks. The proposed strategy can also generate constrained interaction test suites for an interaction strength of 4, which is not possible for many strategies. Real-world case study shows that the generated test suites can effectively detect injected faults using mutation testing.
Social graphs are widely used in research (e.g., epidemiology) and business (e.g., recommender systems). However, sharing these graphs poses privacy risks because they contain sensitive information about individuals. Graph anonymization techniques aim to protect individual users in a graph, while graph de-anonymization aims to re-identify users. The effectiveness of anonymization and de-anonymization algorithms is usually evaluated with privacy metrics. However, it is unclear how strong existing privacy metrics are when they are used in graph privacy. In this paper, we study 26 privacy metrics for graph anonymization and de-anonymization and evaluate their strength in terms of three criteria: monotonicity indicates whether the metric indicates lower privacy for stronger adversaries; for within-scenario comparisons, evenness indicates whether metric values are spread evenly; and for between-scenario comparisons, shared value range indicates whether metrics use a consistent value range across scenarios. Our extensive experiments indicate that no single metric fulfills all three criteria perfectly. We therefore use methods from multi-criteria decision analysis to aggregate multiple metrics in a metrics suite, and we show that these metrics suites improve monotonicity compared to the best individual metric. This important result enables more monotonic, and thus more accurate, evaluations of new graph anonymization and de-anonymization algorithms.
The TSNLP project has investigated various aspects of the construction, maintenance and application of systematic test suites as diagnostic and evaluation tools for NLP applications. The paper summarizes the motivation and main results of the project: besides the solid methodological foundation, TSNLP has produced substantial multi-purpose and multi-user test suites for three European languages together with a set of specialized tools that facilitate the construction, extension, maintenance, retrieval, and customization of the test data. As TSNLP results, including the data and technology, are made publicly available, the project presents a valuable linguistic resourc e that has the potential of providing a wide-spread pre-standard diagnostic and evaluation tool for both developers and users of NLP applications.
Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities.