Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ruihan Wu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ruihan Wu - Chuan Guo - Awni Hannun

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine-learning systems such as self-driving cars or virtual assistants are composed of a large number of machine-learning models that recognize image content, transcribe speech, analyze natural language, infer preferences, rank options, etc. Models in these systems are often developed and trained independently, which raises an obvious concern: Can improving a machine-learning model make the overall system worse? We answer this question affirmatively by showing that improving a model can deteriorate the performance of downstream models, even after those downstream models are retrained. Such self-defeating improvements are the result of entanglement between the models in the system. We perform an error decomposition of systems with multiple machine-learning models, which sheds light on the types of errors that can lead to self-defeating improvements. We also present the results of experiments which show that self-defeating improvements emerge in a realistic stereo-based detection system for cars and pedestrians.

قيم البحث

391 - Jacob Whitehill 2015

In machine learning contests such as the ImageNet Large Scale Visual Recognition Challenge and the KDD Cup, contestants can submit candidate solutions and receive from an oracle (typically the organizers of the competition) the accuracy of their gues ses compared to the ground-truth labels. One of the most commonly used accuracy metrics for binary classification tasks is the Area Under the Receiver Operating Characteristics Curve (AUC). In this paper we provide proofs-of-concept of how knowledge of the AUC of a set of guesses can be used, in two different kinds of attacks, to improve the accuracy of those guesses. On the other hand, we also demonstrate the intractability of one kind of AUC exploit by proving that the number of possible binary labelings of $n$ examples for which a candidate solution obtains a AUC score of $c$ grows exponentially in $n$, for every $cin (0,1)$.

التعلم الآلي

Learning To Predict Vulnerabilities From Vulnerability-Fixes: A Machine Translation Approach

97 - Aayush Garg , Renzo Degiovanni , Matthieu Jimenez 2020

Vulnerability prediction refers to the problem of identifying the system components that are most likely to be vulnerable based on the information gained from historical data. Typically, vulnerability prediction is performed using manually identified features that are potentially linked with vulnerable code. Unfortunately, recent studies have shown that existing approaches are ineffective when evaluated in realistic settings due to some unavoidable noise included in the historical data. To deal with this issue, we develop a prediction method using the encoder-decoder framework of machine translation that automatically learns the latent features (context, patterns, etc.) of code that are linked with vulnerabilities. The key idea of our approach is to learn from things we know, the past vulnerability fixes and their context. We evaluate our approach by comparing it with existing techniques on available releases of the three security-critical open source systems (Linux Kernel, OpenSSL, and Wireshark) with historical vulnerabilities that have been reported in the National Vulnerability Database (NVD). Our evaluation demonstrates that the prediction capability of our approach significantly outperforms the state-of-the-art vulnerability prediction techniques (Software Metrics, Imports, Function Calls, and Text Mining) in both recall and precision values (yielding 4.7 times higher MCC values) under realistic training setting.

التشفير والأمن هندسة البرمجيات

Quantifying Interpretability and Trust in Machine Learning Systems

69 - Philipp Schmidt , Felix Biessmann 2019

Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this context is th at both the quality of interpretability methods as well as trust in ML predictions are difficult to measure. Yet evaluations, comparisons and improvements of trust and interpretability require quantifiable measures. Here we propose a quantitative measure for the quality of interpretability methods. Based on that we derive a quantitative measure of trust in ML decisions. Building on previous work we propose to measure intuitive understanding of algorithmic decisions using the information transfer rate at which humans replicate ML model predictions. We provide empirical evidence from crowdsourcing experiments that the proposed metric robustly differentiates interpretability methods. The proposed metric also demonstrates the value of interpretability for ML assisted human decision making: in our experiments providing explanations more than doubled productivity in annotation tasks. However unbiased human judgement is critical for doctors, judges, policy makers and others. Here we derive a trust metric that identifies when human decisions are overly biased towards ML predictions. Our results complement existing qualitative work on trust and interpretability by quantifiable measures that can serve as objectives for further improving methods in this field of research.

التعلم الآلي التعلم الالي

Improvements to context based self-supervised learning

125 - T. Nathan Mundhenk , Daniel Ho , Barry Y. Chen 2017

We develop a set of methods to improve on the results of self-supervised learning using context. We start with a baseline of patch based arrangement context learning and go from there. Our methods address some overt problems such as chromatic aberrat ion as well as other potential problems such as spatial skew and mid-level feature neglect. We prevent problems with testing generalization on common self-supervised benchmark tests by using different datasets during our development. The results of our methods combined yield top scores on all standard self-supervised benchmarks, including classification and detection on PASCAL VOC 2007, segmentation on PASCAL VOC 2012, and linear tests on the ImageNet and CSAIL Places datasets. We obtain an improvement over our baseline method of between 4.0 to 7.1 percentage points on transfer learning classification tests. We also show results on different standard network architectures to demonstrate generalization as well as portability. All data, models and programs are available at: https://gdo-datasci.llnl.gov/selfsupervised/.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

MLSys: The New Frontier of Machine Learning Systems

135 - Alexander Ratner , Dan Alistarh , Gustavo Alonso 2019

Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different d evelopment and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.

التعلم الآلي قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية