ترغب بنشر مسار تعليمي؟ اضغط هنا

We introduce Causal Program Dependence Analysis (CPDA), a dynamic dependence analysis that applies causal inference to model the strength of program dependence relations in a continuous space. CPDA observes the association between program elements by constructing and executing modifi
Automated testing tools typically create test cases that are different from what human testers create. This often makes the tools less effective, the created tests harder to understand, and thus results in tools providing less support to human tester s. Here, we propose a framework based on cognitive science and, in particular, an analysis of approaches to problem-solving, for identifying cognitive processes of testers. The framework helps map test design steps and criteria used in human test activities and thus to better understand how effective human testers perform their tasks. Ultimately, our goal is to be able to mimic how humans create test cases and thus to design more human-like automated test generation systems. We posit that such systems can better augment and support testers in a way that is meaningful to them.
Statistical analysis is the tool of choice to turn data into information, and then information into empirical knowledge. To be valid, the process that goes from data to knowledge should be supported by detailed, rigorous guidelines, which help ferret out issues with the data or model, and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The datasets original analysis (Ray et al., 2014) and a critical reanalysis (Berger at al., 2019) have attracted considerable attention -- in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase some of its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state of the art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results, and hence help buttress continued progress in empirical software engineering research.
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners awareness about waste in test arte facts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the diversity has mainly been measured on and between artefacts (e.g., inputs, outputs or test scripts). Here, we introduce a family of measures to capture behavioural diversity (b-div) of test cases by comparing their executions and failure outcomes. Using failure information to capture the SUT behaviour has been shown to improve effectiveness of history-based test prioritisation approaches. However, history-based techniques require reliable test execution logs which are often not available or can be difficult to obtain due to flaky tests, scarcity of test executions, etc. To be generally applicable we instead propose to use mutation testing to measure behavioral diversity by running the set of test cases on various mutat
Software testing is a complex, intellectual activity based (at least) on analysis, reasoning, decision making, abstraction and collaboration performed in a highly demanding environment. Naturally, it uses and allocates multiple cognitive resources in software testers. However, while a cognitive psychology perspective is increasingly used in the general software engineering literature, it has yet to find its place in software testing. To the best of our knowledge, no theory of software testers cognitive processes exists. Here, we take the first step towards such a theory by presenting a cognitive model of software testing based on how problem solving is conceptualized in cognitive psychology. Our approach is to instantiate a general problem solving process for the specific problem of creating test cases. We then propose an experiment for testing our cognitive test design model. The experiment makes use of verbal protocol analysis to understand the mechanisms by which human testers choose, design, implement and evaluate test cases. An initial evaluation was then performed with five software engineering master students as subjects. The results support a problem solving-based model of test design for capturing testers cognitive processes.
Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. Object segmentation is one such task: its aim is to precisely locate boundaries of objects and classify the identified objects, helping autonomous cars to recognise the road environment and the traffic situation. Not only is this task safety critical, but developing a DNN based object segmentation module presents a set of challenges that are significantly different from traditional development of safety critical software. The development process in use consists of multiple iterations of data collection, labelling, training, and evaluation. Among these stages, training and evaluation are computation intensive while data collection and labelling are manual labour intensive. This paper shows how development of DNN based object segmentation can be improved by exploiting the correlation between Surprise Adequacy (SA) and model performance. The correlation allows us to predict model performance for inputs without manually labelling them. This, in turn, enables understanding of model performance, more guided data collection, and informed decisions about further training. In our industrial case study the technique allows cost savings of up to 50% with negligible evaluation inaccuracy. Furthermore, engineers can trade off cost savings versus the tolerable level of inaccuracy depending on different development phases and scenarios.
A meaningful and deep understanding of the human aspects of software engineering (SE) requires psychological constructs to be considered. Psychology theory can facilitate the systematic and sound development as well as the adoption of instruments (e. g., psychological tests, questionnaires) to assess these constructs. In particular, to ensure high quality, the psychometric properties of instruments need evaluation. In this paper, we provide an introduction to psychometric theory for the evaluation of measurement instruments for SE researchers. We present guidelines that enable using existing instruments and developing new ones adequately. We conducted a comprehensive review of the psychology literature framed by the Standards for Educational and Psychological Testing. We detail activities used when operationalizing new psychological constructs, such as item pooling, item review, pilot testing, item analysis, factor analysis, statistical property of items, reliability, validity, and fairness in testing and test bias. We provide an openly available example of a psychometric evaluation based on our guideline. We hope to encourage a culture change in SE research towards the adoption of established methods from psychology. To improve the quality of behavioral research in SE, studies focusing on introducing, validating, and then using psychometric instruments need to be more common.
The testing of Deep Neural Networks (DNNs) has become increasingly important as DNNs are widely adopted by safety critical systems. While many test adequacy criteria have been suggested, automated test input generation for many types of DNNs remains a challenge because the raw input space is too large to randomly sample or to navigate and search for plausible inputs. Consequently, current testing techniques for DNNs depend on small local perturbations to existing inputs, based on the metamorphic testing principle. We propose new ways to search not over the entire image space, but rather over a plausible input space that resembles the true training distribution. This space is constructed using Variational Autoencoders (VAEs), and navigated through their latent vector space. We show that this space helps efficiently produce test inputs that can reveal information about the robustness of DNNs when dealing with realistic tests, opening the field to meaningful exploration through the space of highly structured images.
For software to be reliable and resilient, it is widely accepted that tests must be created and maintained alongside the software itself. One safeguard from vulnerabilities and failures in code is to ensure correct behavior on the boundaries between the input space sub-domains. So-called boundary value analysis (BVA) and boundary value testing (BVT) techniques aim to exercise those boundaries and increase test effectiveness. However, the concepts of BVA and BVT themselves are not generally well defined, and it is not clear how to identify relevant sub-domains, and thus the boundaries delineating them, given a specification. This has limited adoption and hindered automation. We clarify BVA and BVT and introduce Boundary Value Exploration (BVE) to describe techniques that support them by helping to detect and identify boundary inputs. Additionally, we propose two concrete BVE techniques based on information-theoretic distance functions: (i) an algorithm for boundary detection and (ii) the usage of software visualization to explore the behavior of the software under test and identify its boundary behavior. As an initial evaluation, we apply these techniques on a much used and well-tested date handling library. Our results reveal questionable behavior at boundaries highlighted by our techniques. In conclusion, we argue that the boundary value exploration that our techniques enable is a step towards automated boundary value analysis and testing, fostering their wider use and improving test effectiveness and efficiency.
Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings---such as lack of flexibility and results that are unintuitive and hard to interpret---that curtail their effectiveness when dealing with the heterogeneous data that is increasingly available for empirical analysis of software engineering practice. In this paper, we pinpoint these shortcomings, and present Bayesian data analysis techniques that provide tangible benefits---as they can provide clearer results that are simultaneously robust and nuanced. After a short, high-level introduction to the basic tools of Bayesian statistics, we present the reanalysis of two empirical studies on the effectiveness of automatically generated tests and the performance of programming languages. By contrasting the original frequentist analyses with our new Bayesian analyses, we demonstrate the concrete advantages of the latter. To conclude we advocate a more prominent role for Bayesian statistical techniques in empirical software engineering research and practice.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا