ﻻ يوجد ملخص باللغة العربية
Program verifiers are not exempt from the bugs that affect nearly every piece of software. In addition, they often exhibit brittle behavior: their performance changes considerably with details of how the input program is expressed-details that should be irrelevant, such as the order of independent declarations. Such a lack of robustness frustrates users who have to spend considerable time figuring out a tools idiosyncrasies before they can use it effectively. This paper introduces a technique to detect lack of robustness of program verifiers; the technique is lightweight and fully automated, as it is based on testing methods (such as mutation testing and metamorphic testing). The key idea is to generate many simple variants of a program that initially passes verification. All variants are, by construction, equivalent to the original program; thus, any variant that fails verification indicates lack of robustness in the verifier. We implemented our technique in a tool called mugie, which operates on programs written in the popular Boogie language for verification-used as intermediate representation in numerous program verifiers. Experiments targeting 135 Boogie programs indicate that brittle behavior occurs fairly frequently (16 programs) and is not hard to trigger. Based on these results, the paper discusses the main sources of brittle behavior and suggests means of improving robustness.
Recently, there has been a significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is deep learning testing, where adversarial examples (a.k.a.~bugs) of
In the field of software engineering, applying language models to the token sequence of source code is the state-of-art approach to build a code recommendation system. The syntax tree of source code has hierarchical structures. Ignoring the character
Deep neural networks (DNNs) have shown remarkable performance in a variety of domains such as computer vision, speech recognition, or natural language processing. Recently they also have been applied to various software engineering tasks, typically i
We distinguish two general modes of testing for Deep Neural Networks (DNNs): Offline testing where DNNs are tested as individual units based on test datasets obtained without involving the DNNs under test, and online testing where DNNs are embedded i
Automated machine learning (AutoML) systems aim at finding the best machine learning (ML) pipeline that automatically matches the task and data at hand. We investigate the robustness of machine learning pipelines generated with three AutoML systems,