ترغب بنشر مسار تعليمي؟ اضغط هنا

A Theoretical Framework for Understanding Mutation-Based Testing Methods

182   0   0.0 ( 0 )
 نشر من قبل Donghwan Shin
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In the field of mutation analysis, mutation is the systematic generation of mutated programs (i.e., mutants) from an original program. The concept of mutation has been widely applied to various testing problems, including test set selection, fault localization, and program repair. However, surprisingly little focus has been given to the theoretical foundation of mutation-based testing methods, making it difficult to understand, organize, and describe various mutation-based testing methods. This paper aims to consider a theoretical framework for understanding mutation-based testing methods. While there is a solid testing framework for general testing, this is incongruent with mutation-based testing methods, because it focuses on the correctness of a program for a test, while the essence of mutation-based testing concerns the differences between programs (including mutants) for a test. In this paper, we begin the construction of our framework by defining a novel testing factor, called a test differentiator, to transform the paradigm of testing from the notion of correctness to the notion of difference. We formally define behavioral differences of programs for a set of tests as a mathematical vector, called a d-vector. We explore the multi-dimensional space represented by d-vectors, and provide a graphical model for describing the space. Based on our framework and formalization, we interpret existing mutation-based fault localization methods and mutant set minimization as applications, and identify novel implications for future work.

قيم البحث

اقرأ أيضاً

Early design artifacts of embedded systems, such as architectural models, represent convenient abstractions for reasoning about a systems structure and functionality. One such example is the Electronic Architecture and Software Tools-Architecture Des cription Language (EAST-ADL), a domain-specific architectural language that targets the automotive industry. EAST-ADL is used to represent both hardware and software elements, as well as related extra-functional information (e.g., timing properties, triggering information, resource consumption). Testing architectural models is an important activity in engineering large-scale industrial systems, which sparks a growing research interest. The main contributions of this paper are: (i) an approach for creating energy-related mutants for EAST-ADL architectural models, (ii) a method for overcoming the equivalent mutant problem (i.e., the problem of finding a test case which can distinguish the observable behavior of a mutant from the original one), (iii) a test generation approach based on UPPAAL Statistical Model Checker (SMC), and (iv) a test selection criteria based on mutation analysis using our MATS tool.
Mutation testing is a well-established technique for assessing a test suites quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep learning (DL) in particular; researchers have proposed approaches, tools, and statistically sound heuristics to determine whether mutants in DL systems are killed or not. However, as we will argue in this work, questions can be raised to what extent currently used mutation testing techniques in DL are actually in line with the classical interpretation of mutation testing. We observe that ML model development resembles a test-driven development (TDD) process, in which a training algorithm (`programmer) generates a model (program) that fits the data points (test data) to labels (implicit assertions), up to a certain threshold. However, considering proposed mutation testing techniques for ML systems under this TDD metaphor, in current approaches, the distinction between production and test code is blurry, and the realism of mutation operators can be challenged. We also consider the fundamental hypotheses underlying classical mutation testing: the competent programmer hypothesis and coupling effect hypothesis. As we will illustrate, these hypotheses do not trivially translate to ML system development, and more conscious and explicit scoping and concept mapping will be needed to truly draw parallels. Based on our observations, we propose several action points for better alignment of mutation testing techniques for ML with paradigms and vocabularies of classical mutation testing.
Ensuring the functional correctness and safety of autonomous vehicles is a major challenge for the automotive industry. However, exhaustive physical test drives are not feasible, as billions of driven kilometers would be required to obtain reliable r esults. Scenariobased testing is an approach to tackle this problem and reduce necessary test drives by replacing driven kilometers with simulations of relevant or interesting scenarios. These scenarios can be generated or extracted from recorded data with machine learning algorithms or created by experts. In this paper, we propose a novel graphical scenario modeling language. The graphical framework allows experts to create new scenarios or review ones designed by other experts or generated by machine learning algorithms. The scenario description is modeled as a graph and based on behavior trees. It supports different abstraction levels of scenario description during software and test development. Additionally, the graphbased structure provides modularity and reusable sub-scenarios, an important use case in scenario modeling. A graphical visualization of the scenario enhances comprehensibility for different users. The presented approach eases the scenario creation process and increases the usage of scenarios within development and testing processes.
Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the t est dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners awareness about waste in test arte facts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the diversity has mainly been measured on and between artefacts (e.g., inputs, outputs or test scripts). Here, we introduce a family of measures to capture behavioural diversity (b-div) of test cases by comparing their executions and failure outcomes. Using failure information to capture the SUT behaviour has been shown to improve effectiveness of history-based test prioritisation approaches. However, history-based techniques require reliable test execution logs which are often not available or can be difficult to obtain due to flaky tests, scarcity of test executions, etc. To be generally applicable we instead propose to use mutation testing to measure behavioral diversity by running the set of test cases on various mutat
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا