No Arabic abstract
Deep learning achieves remarkable performance on pattern recognition, but can be vulnerable to defects of some important properties such as robustness and security. This tutorial is based on a stream of research conducted since the summer of 2018 at a few UK universities, including the University of Liverpool, University of Oxford, Queens University Belfast, University of Lancaster, University of Loughborough, and University of Exeter. The research aims to adapt software engineering methods, in particular software testing methods, to work with machine learning models. Software testing techniques have been successful in identifying software bugs, and helping software developers in validating the software they design and implement. It is for this reason that a few software testing techniques -- such as the MC/DC coverage metric -- have been mandated in industrial standards for safety critical systems, including the ISO26262 for automotive systems and the RTCA DO-178B/C for avionics systems. However, these techniques cannot be directly applied to machine learning models, because the latter are drastically different from traditional software, and their design follows a completely different development life-cycle. As the outcome of this thread of research, the team has developed a series of methods that adapt the software testing techniques to work with a few classes of machine learning models. The latter notably include convolutional neural networks, recurrent neural networks, and random forest. The tools developed from this research are now collected, and publicly released, in a GitHub repository: url{https://github.com/TrustAI/DeepConcolic}, with the BSD 3-Clause licence. This tutorial is to go through the major functionalities of the tools with a few running examples, to exhibit how the developed techniques work, what the results are, and how to interpret them.
We distinguish two general modes of testing for Deep Neural Networks (DNNs): Offline testing where DNNs are tested as individual units based on test datasets obtained without involving the DNNs under test, and online testing where DNNs are embedded into a specific application environment and tested in a closed-loop mode in interaction with the application environment. Typically, DNNs are subjected to both types of testing during their development life cycle where offline testing is applied immediately after DNN training and online testing follows after offline testing and once a DNN is deployed within a specific application environment. In this paper, we study the relationship between offline and online testing. Our goal is to determine how offline testing and online testing differ or complement one another and if offline testing results can be used to help reduce the cost of online testing? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end controls of steering functions of self-driving vehicles. Our results show that offline testing is less effective than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing. Further, we cannot exploit offline testing results to reduce the cost of online testing in practice since we are not able to identify specific situations where offline testing could be as accurate as online testing in identifying safety requirement violations.
The success of several constraint-based modeling languages such as OPL, ZINC, or COMET, appeals for better software engineering practices, particularly in the testing phase. This paper introduces a testing framework enabling automated test case generation for constraint programming. We propose a general framework of constraint program development which supposes that a first declarative and simple constraint model is available from the problem specifications analysis. Then, this model is refined using classical techniques such as constraint reformulation, surrogate and global constraint addition, or symmetry-breaking to form an improved constraint model that must be thoroughly tested before being used to address real-sized problems. We think that most of the faults are introduced in this refinement step and propose a process which takes the first declarative model as an oracle for detecting non-conformities. We derive practical test purposes from this process to generate automatically test data that exhibit non-conformities. We implemented this approach in a new tool called CPTEST that was used to automatically detect non-conformities on two classical benchmark programs, namely the Golomb rulers and the car-sequencing problem.
This note concerns a search for publications in which the pragmatic concept of a test as conducted in the practice of software testing is formalized, a theory about software testing based on such a formalization is presented or it is demonstrated on the basis of such a theory that there are solid grounds to test software in cases where in principle other forms of analysis could be used. This note reports on the way in which the search has been carried out and the main outcomes of the search. The message of the note is that the fundamentals of software testing are not yet complete in some respects.
Random testing (RT) is a well-studied testing method that has been widely applied to the testing of many applications, including embedded software systems, SQL database systems, and Android applications. Adaptive random testing (ART) aims to enhance RTs failure-detection ability by more evenly spreading the test cases over the input domain. Since its introduction in 2001, there have been many contributions to the development of ART, including various approaches, implementations, assessment and evaluation methods, and applications. This paper provides a comprehensive survey on ART, classifying techniques, summarizing application areas, and analyzing experimental evaluations. This paper also addresses some misconceptions about ART, and identifies open research challenges to be further investigated in the future work.
This volume contains the proceedings of the Eighth Workshop on Model-Based Testing (MBT 2013), which was held on March 17, 2013 in Rome, Italy, as a satellite event of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013. The workshop is devoted to model-based testing of both software and hardware. Model-based testing uses models describing the required behavior of the system under consideration to guide such efforts as test selection and test results evaluation. Testing validates the real system behavior against models and checks that the implementation conforms to them, but is capable also to find errors in the models themselves. The first MBT workshop was held in 2004, in Barcelona. At that time MBT already had become a hot topic, but the MBT workshop was the first event devoted mostly to this domain. Since that time the area has generated enormous scientific interest, and today there are several specialized workshops and more broad conferences on software and hardware design and quality assurance covering model based testing. MBT has become one of the most powerful system analysis tools, one of the latest cutting-edge topics related is applying MBT in security analysis and testing. MBT workshop tries to keep up with current trends.