On Misbehaviour and Fault Tolerance in Machine Learning Systems

297 0 0.0 ( 0 )

Download Cite

Added by Lalli Myllyaho

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Lalli Myllyaho - Mikko Raatikainen - Tomi Mannisto

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adapted ML systems themselves. We studied software designs that aim at introducing fault tolerance in ML systems so that possible problems in ML components of the systems can be avoided. The research was conducted as a case study, and its data was collected through five semi-structured interviews with experienced software architects. We present a conceptualisation of the misbehaviour of ML systems, the perceived role of fault tolerance, and the designs used. Common patterns to incorporating ML components in design in a fault tolerant fashion have started to emerge. ML models are, for example, guarded by monitoring the inputs and their distribution, and enforcing business rules on acceptable outputs. Multiple, specialised ML models are used to adapt to the variations and changes in the surrounding world, and simpler fall-over techniques like default outputs are put in place to have systems up and running in the face of problems. However, the general role of these patterns is not widely acknowledged. This is mainly due to the relative immaturity of using ML as part of a complete software system: the field still lacks established frameworks and practices beyond training to implement, operate, and maintain the software that utilises ML. ML software engineering needs further analysis and development on all fronts.

rate research

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

249 - Shuo Liu 2021

The robustness of distributed optimization is an emerging field of study, motivated by various applications of distributed optimization including distributed machine learning, distributed sensing, and swarm robotics. With the rapid expansion of the scale of distributed systems, resilient distributed algorithms for optimization are needed, in order to mitigate system failures, communication issues, or even malicious attacks. This survey investigates the current state of fault-tolerance research in distributed optimization, and aims to provide an overview of the existing studies on both fault-tolerant distributed optimization theories and applicable algorithms.

Distributed Parallel and Cluster Computing

Testing quantum fault tolerance on small systems

90 - D. Willsch , M. Willsch , F. Jin 2018

We extensively test a recent protocol to demonstrate quantum fault tolerance on three systems: (1) a real-time simulation of five spin qubits coupled to an environment with two-level defects, (2) a real-time simulation of transmon quantum computers, and (3) the 16-qubit processor of the IBM Q Experience. In the simulations, the dynamics of the full system is obtained by numerically solving the time-dependent Schrodinger equation. We find that the fault-tolerant scheme provides a systematic way to improve the results when the errors are dominated by the inherent control and measurement errors present in transmon systems. However, the scheme fails to satisfy the criterion for fault tolerance when decoherence effects are dominant.

Quantum Physics

Fault Prediction based on Software Metrics and SonarQube Rules. Machine or Deep Learning?

174 - Francesco Lomio , Sergio Moreschini , Valentina Lenarduzzi 2021

Background. Developers spend more time fixing bugs and refactoring the code to increase the maintainability than developing new features. Researchers investigated the code quality impact on fault-proneness focusing on code smells and code metrics. Objective. We aim at advancing fault-inducing commit prediction based on SonarQube considering the contribution provided by each rule and metric. Method. We designed and conducted a case study among 33 Java projects analyzed with SonarQube and SZZ to identify fault-inducing and fault-fixing commits. Moreover, we investigated fault-proneness of each SonarQube rule and metric using Machine and Deep Learning models. Results. We analyzed 77,932 commits that contain 40,890 faults and infected by more than 174 SonarQube rules violated 1,9M times, on which there was calculated 24 software metrics available by the tool. Compared to machine learning models, deep learning provide a more accurate fault detection accuracy and allowed us to accurately identify the fault-prediction power of each SonarQube rule. As a result, fourteen of the 174 violated rules has an importance higher than 1% and account for 30% of the total fault-proneness importance, while the fault proneness of the remaining 165 rules is negligible. Conclusion. Future works might consider the adoption of timeseries analysis and anomaly detection techniques to better and more accurately detect the rules that impact fault-proneness.

Software Engineering

(m,n)-Semirings and a Generalized Fault Tolerance Algebra of Systems

399 - Syed Eqbal Alam , Shrisha Rao , Bijan Davvaz 2010

We propose a new class of mathematical structures called (m,n)-semirings} (which generalize the usual semirings), and describe their basic properties. We also define partial ordering, and generalize the concepts of congruence, homomorphism, ideals, etc., for (m,n)-semirings. Following earlier work by Rao, we consider a system as made up of several components whose failures may cause it to fail, and represent the set of systems algebraically as an (m,n)-semiring. Based on the characteristics of these components we present a formalism to compare the fault tolerance behaviour of two systems using our framework of a partially ordered (m,n)-semiring.

General Mathematics

Towards Guidelines for Assessing Qualities of Machine Learning Systems

117 - Julien Siebert , Lisa Joeckel , Jens Heidrich 2020

Nowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary quality aspects of the system and its components (such as ISO/IEC 25010). Due to the different nature of ML, we have to adjust quality aspects or add additional ones (such as trustworthiness) and be very precise about which aspect is really relevant for which object of interest (such as completeness of training data), and how to objectively assess adherence to quality requirements. In this article, we present the construction of a quality model (i.e., evaluation objects, quality aspects, and metrics) for an ML system based on an industrial use case. This quality model enables practitioners to specify and assess quality requirements for such kinds of ML systems objectively. In the future, we want to learn how the term quality differs between different types of ML systems and come up with general guidelines for specifying and assessing qualities of ML systems.

Software Engineering Machine Learning