Benchmarking Attribution Methods with Relative Feature Importance

87 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mengjiao Yang

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Mengjiao Yang - Been Kim

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Interpretability is an important area of research for safe deployment of machine learning systems. One particular type of interpretability method attributes model decisions to input features. Despite active development, quantitative evaluation of feature attribution methods remains difficult due to the lack of ground truth: we do not know which input features are in fact important to a model. In this work, we propose a framework for Benchmarking Attribution Methods (BAM) with a priori knowledge of relative feature importance. BAM includes 1) a carefully crafted dataset and models trained with known relative feature importance and 2) three complementary metrics to quantitatively evaluate attribution methods by comparing feature attributions between pairs of models and pairs of inputs. Our evaluation on several widely-used attribution methods suggests that certain methods are more likely to produce false positive explanations---features that are incorrectly attributed as more important to model prediction. We open source our dataset, models, and metrics.

قيم البحث

66 - Juntang Zhuang , Nicha C. Dvornek , Xiaoxiao Li 2019

Deep neural networks are vulnerable to adversarial attacks and hard to interpret because of their black-box nature. The recently proposed invertible network is able to accurately reconstruct the inputs to a layer from its outputs, thus has the potent ial to unravel the black-box model. An invertible network classifier can be viewed as a two-stage model: (1) invertible transformation from input space to the feature space; (2) a linear classifier in the feature space. We can determine the decision boundary of a linear classifier in the feature space; since the transform is invertible, we can invert the decision boundary from the feature space to the input space. Furthermore, we propose to determine the projection of a data point onto the decision boundary, and define explanation as the difference between data and its projection. Finally, we propose to locally approximate a neural network with its first-order Taylor expansion, and define feature importance using a local linear model. We provide the implementation of our method: url{https://github.com/juntang-zhuang/explain_invertible}.

التعلم الآلي التعلم الالي

NormLime: A New Feature Importance Metric for Explaining Deep Neural Networks

78 - Isaac Ahern , Adam Noack , Luis Guzman-Nateras 2019

The problem of explaining deep learning models, and model predictions generally, has attracted intensive interest recently. Many successful approaches forgo global approximations in order to provide more faithful local interpretations of the models b ehavior. LIME develops multiple interpretable models, each approximating a large neural network on a small region of the data manifold and SP-LIME aggregates the local models to form a global interpretation. Extending this line of research, we propose a simple yet effective method, NormLIME for aggregating local models into global and class-specific interpretations. A human user study strongly favored class-specific interpretations created by NormLIME to other feature importance metrics. Numerical experiments confirm that NormLIME is effective at recognizing important features.

التعلم الآلي التعلم الالي

URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural Networks

131 - Meet P. Vadera , Adam D. Cobb , Brian Jalaian 2020

While deep learning methods continue to improve in predictive accuracy on a wide range of application domains, significant issues remain with other aspects of their performance including their ability to quantify uncertainty and their robustness. Rec ent advances in approximate Bayesian inference hold significant promise for addressing these concerns, but the computational scalability of these methods can be problematic when applied to large-scale models. In this paper, we describe initial work on the development ofURSABench(the Uncertainty, Robustness, Scalability, and Accu-racy Benchmark), an open-source suite of bench-marking tools for comprehensive assessment of approximate Bayesian inference methods with a focus on deep learning-based classification tasks

التعلم الآلي التعلم الالي

Challenging common interpretability assumptions in feature attribution explanations

79 - Jonathan Dinu n Unaffiliated 2020

As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to this need with explainable AI (XAI), but often proclaim interpretability axiomatically without evaluation. When these systems are evaluated, they are often tested through offline simulations with proxy metrics of interpretability (such as model complexity). We empirically evaluate the veracity of three common interpretability assumptions through a large scale human-subjects experiment with a simple placebo explanation control. We find that feature attribution explanations provide marginal utility in our task for a human decision maker and in certain cases result in worse decisions due to cognitive and contextual confounders. This result challenges the assumed universal benefit of applying these methods and we hope this work will underscore the importance of human evaluation in XAI research. Supplemental materials -- including anonymized data from the experiment, code to replicate the study, an interactive demo of the experiment, and the models used in the analysis -- can be found at: https://doi.pizza/challenging-xai.

التعلم الآلي أجهزة الكمبيوتر والمجتمع تفاعل الإنسان والحاسوب

ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping

156 - Cher Bass , Mariana da Silva , Carole Sudre 2020

Feature attribution (FA), or the assignment of class-relevance to different locations in an image, is important for many classification problems but is particularly crucial within the neuroscience domain, where accurate mechanistic models of behaviou rs, or disease, require knowledge of all features discriminative of a trait. At the same time, predicting class relevance from brain images is challenging as phenotypes are typically heterogeneous, and changes occur against a background of significant natural variation. Here, we present a novel framework for creating class specific FA maps through image-to-image translation. We propose the use of a VAE-GAN to explicitly disentangle class relevance from background features for improved interpretability properties, which results in meaningful FA maps. We validate our method on 2D and 3D brain image datasets of dementia (ADNI dataset), ageing (UK Biobank), and (simulated) lesion detection. We show that FA maps generated by our method outperform baseline FA methods when validated against ground truth. More significantly, our approach is the first to use latent space sampling to support exploration of phenotype variation. Our code will be available online at https://github.com/CherBass/ICAM.

التعلم الآلي معالجة الصور والفيديو التعلم الالي