Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

125 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xuhong Li

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xuhong Li - Haoyi Xiong - Xingjian Li

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Deep neural networks have been well-known for their superb performance in handling various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction results of deep models. In recent years, many interpretation tools have been proposed to explain or reveal the ways that deep models make decisions. In this paper, we review this line of research and try to make a comprehensive survey. Specifically, we introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused. First of all, to address the research efforts in interpretations, we elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy. Then, to understand the results of interpretation, we also survey the performance metrics for evaluating interpretation algorithms. Further, we summarize the existing work in evaluating models interpretability using trustworthy interpretation algorithms. Finally, we review and discuss the connections between deep models interpretations and other factors, such as adversarial robustness and data augmentations, and we introduce several open-source libraries for interpretation algorithms and evaluation approaches.

قيم البحث

195 - Weishen Pan , Changshui Zhang 2021

As machine learning algorithms getting adopted in an ever-increasing number of applications, interpretation has emerged as a crucial desideratum. In this paper, we propose a mathematical definition for the human-interpretable model. In particular, we define interpretability between two information process systems. If a prediction model is interpretable by a human recognition system based on the above interpretability definition, the prediction model is defined as a completely human-interpretable model. We further design a practical framework to train a completely human-interpretable model by user interactions. Experiments on image datasets show the advantages of our proposed model in two aspects: 1) The completely human-interpretable model can provide an entire decision-making process that is human-understandable; 2) The completely human-interpretable model is more robust against adversarial attacks.

التعلم الآلي تفاعل الإنسان والحاسوب

Learning a Formula of Interpretability to Learn Interpretable Formulas

66 - Marco Virgolin , Andrea De Lorenzo , Eric Medvet 2020

Many risk-sensitive applications require Machine Learning (ML) models to be interpretable. Attempts to obtain interpretable models typically rely on tuning, by trial-and-error, hyper-parameters of model complexity that are only loosely related to int erpretability. We show that it is instead possible to take a meta-learning approach: an ML model of non-trivial Proxies of Human Interpretability (PHIs) can be learned from human feedback, then this model can be incorporated within an ML training process to directly optimize for interpretability. We show this for evolutionary symbolic regression. We first design and distribute a survey finalized at finding a link between features of mathematical formulas and two established PHIs, simulatability and decomposability. Next, we use the resulting dataset to learn an ML model of interpretability. Lastly, we query this model to estimate the interpretability of evolving solutions within bi-objective genetic programming. We perform experiments on five synthetic and eight real-world symbolic regression problems, comparing to the traditional use of solution size minimization. The results show that the use of our model leads to formulas that are, for a same level of accuracy-interpretability trade-off, either significantly more or equally accurate. Moreover, the formulas are also arguably more interpretable. Given the very positive results, we believe that our approach represents an important stepping stone for the design of next-generation interpretable (evolutionary) ML algorithms.

التعلم الآلي التعلم الالي

Deep Active Learning by Model Interpretability

133 - Qiang Liu , Zhaocheng Liu , Xiaofang Zhu 2020

Recent successes of Deep Neural Networks (DNNs) in a variety of research tasks, however, heavily rely on the large amounts of labeled samples. This may require considerable annotation cost in real-world applications. Fortunately, active learning is a promising methodology to train high-performing model with minimal annotation cost. In the deep learning context, the critical question of active learning is how to precisely identify the informativeness of samples for DNN. In this paper, inspired by piece-wise linear interpretability in DNN, we introduce the linearly separable regions of samples to the problem of active learning, and propose a novel Deep Active learning approach by Model Interpretability (DAMI). To keep the maximal representativeness of the entire unlabeled data, DAMI tries to select and label samples on different linearly separable regions introduced by the piece-wise linear interpretability in DNN. We focus on modeling Multi-Layer Perception (MLP) for modeling tabular data. Specifically, we use the local piece-wise interpretation in MLP as the representation of each sample, and directly run K-Center clustering to select and label samples. To be noted, this whole process of DAMI does not require any hyper-parameters to tune manually. To verify the effectiveness of our approach, extensive experiments have been conducted on several tabular datasets. The experimental results demonstrate that DAMI constantly outperforms several state-of-the-art compared approaches.

التعلم الآلي التعلم الالي

Geometrization of deep networks for the interpretability of deep learning systems

120 - Xiao Dong , Ling Zhou 2019

How to understand deep learning systems remains an open problem. In this paper we propose that the answer may lie in the geometrization of deep networks. Geometrization is a bridge to connect physics, geometry, deep network and quantum computation an d this may result in a new scheme to reveal the rule of the physical world. By comparing the geometry of image matching and deep networks, we show that geometrization of deep networks can be used to understand existing deep learning systems and it may also help to solve the interpretability problem of deep learning systems.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Designing Interpretable Approximations to Deep Reinforcement Learning

97 - Nathan Dahlin , Krishna Chaitanya Kalagarla , Nikhil Naik 2020

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or requirements such as verifiable safety guarantees, it may not be feasible to actually use such high-performing DNNs in practice. Many techniques have been developed in recent years to compress or distill complex DNNs into smaller, faster or more understandable models and controllers. This work seeks to identify reduced models that not only preserve a desired performance level, but also, for example, succinctly explain the latent knowledge represented by a DNN. We illustrate the effectiveness of the proposed approach on the evaluation of decision tree variants and kernel machines in the context of benchmark reinforcement learning tasks.

التعلم الآلي الذكاء الاصطناعي