MEME: Generating RNN Model Explanations via Model Extraction

292 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dmitry Kazhdan

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dmitry Kazhdan - Botty Dimanov - Mateja Jamnik

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recurrent Neural Networks (RNNs) have achieved remarkable performance on a range of tasks. A key step to further empowering RNN-based approaches is improving their explainability and interpretability. In this work we present MEME: a model extraction approach capable of approximating RNNs with interpretable models represented by human-understandable concepts and their interactions. We demonstrate how MEME can be applied to two multivariate, continuous data case studies: Room Occupation Prediction, and In-Hospital Mortality Prediction. Using these case-studies, we show how our extracted models can be used to interpret RNNs both locally and globally, by approximating RNN decision-making via interpretable concept interactions.

قيم البحث

اقرأ أيضاً

Model Agnostic Multilevel Explanations

93 - Karthikeyan Natesan Ramamurthy , Bhanukiran Vinzamuri , Yunfeng Zhang 2020

In recent years, post-hoc local instance-level and global dataset-level explainability of black-box models has received a lot of attention. Much less attention has been given to obtaining insights at intermediate or group levels, which is a need outl ined in recent works that study the challenges in realizing the guidelines in the General Data Protection Regulation (GDPR). In this paper, we propose a meta-method that, given a typical local explainability method, can build a multilevel explanation tree. The leaves of this tree correspond to the local explanations, the root corresponds to the global explanation, and intermediate levels correspond to explanations for groups of data points that it automatically clusters. The method can also leverage side information, where users can specify points for which they may want the explanations to be similar. We argue that such a multilevel structure can also be an effective form of communication, where one could obtain few explanations that characterize the entire dataset by considering an appropriate level in our explanation tree. Explanations for novel test points can be cost-efficiently obtained by associating them with the closest training points. When the local explainability technique is generalized additive (viz. LIME, GAMs), we develop a fast approximate algorithm for building the multilevel tree and study its convergence behavior. We validate the effectiveness of the proposed technique based on two human studies -- one with experts and the other with non-expert users -- on real world datasets, and show that we produce high fidelity sparse explanations on several other public datasets.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Interpretability via Model Extraction

76 - Osbert Bastani , Carolyn Kim , Hamsa Bastani 2017

The ability to interpret machine learning models has become increasingly important now that machine learning is used to inform consequential decisions. We propose an approach called model extraction for interpreting complex, blackbox models. Our appr oach approximates the complex model using a much more interpretable model; as long as the approximation quality is good, then statistical properties of the complex model are reflected in the interpretable model. We show how model extraction can be used to understand and debug random forests and neural nets trained on several datasets from the UCI Machine Learning Repository, as well as control policies learned for several classical reinforcement learning problems.

التعلم الآلي أجهزة الكمبيوتر والمجتمع التعلم الالي

Model Explanations via the Axiomatic Causal Lens

57 - Vignesh Viswanathan , Yair Zick 2021

Explaining the decisions of black-box models has been a central theme in the study of trustworthy ML. Numerous measures have been proposed in the literature; however, none of them have been able to adopt a provably causal take on explainability. Buil ding upon Halpern and Pearls formal definition of a causal explanation, we derive an analogous set of axioms for the classification setting, and use them to derive three explanation measures. Our first measure is a natural adaptation of Chockler and Halperns notion of causal responsibility, whereas the other two correspond to existing game-theoretic influence measures. We present an axiomatic treatment for our proposed indices, showing that they can be uniquely characterized by a set of desirable properties. We compliment this with computational analysis, providing probabilistic approximation schemes for all of our proposed measures. Thus, our work is the first to formally bridge the gap between model explanations, game-theoretic influence, and causal analysis.

التعلم الآلي

Interpreting Blackbox Models via Model Extraction

69 - Osbert Bastani , Carolyn Kim , Hamsa Bastani 2017

Interpretability has become incredibly important as machine learning is increasingly used to inform consequential decisions. We propose to construct global explanations of complex, blackbox models in the form of a decision tree approximating the orig inal model---as long as the decision tree is a good approximation, then it mirrors the computation performed by the blackbox model. We devise a novel algorithm for extracting decision tree explanations that actively samples new training points to avoid overfitting. We evaluate our algorithm on a random forest to predict diabetes risk and a learned controller for cart-pole. Compared to several baselines, our decision trees are both substantially more accurate and equally or more interpretable based on a user study. Finally, we describe several insights provided by our interpretations, including a causal issue validated by a physician.

التعلم الآلي

Towards Better Model Understanding with Path-Sufficient Explanations

267 - Ronny Luss , Amit Dhurandhar 2021

Feature based local attribution methods are amongst the most prevalent in explainable artificial intelligence (XAI) literature. Going beyond standard correlation, recently, methods have been proposed that highlight what should be minimally sufficient to justify the classification of an input (viz. pertinent positives). While minimal sufficiency is an attractive property, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model, thus making it difficult to judge its overall quality. To overcome these limitations, we propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of sufficient explanations for a given input of strictly decreasing size (or value) -- from original input to a minimally sufficient explanation -- which can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input. We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across all three modalities (image, tabular and text). A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.

التعلم الآلي الذكاء الاصطناعي