No Arabic abstract
One of the key challenges when developing a predictive model is the capability to describe the domain knowledge and the cause-effect relationships in a simple way. Decision rules are a useful and important methodology in this context, justifying their application in several areas, in particular in clinical practice. Several machine-learning classifiers have exploited the advantageous properties of decision rules to build intelligent prediction models, namely decision trees and ensembles of trees (ETs). However, such methodologies usually suffer from a trade-off between interpretability and predictive performance. Some procedures consider a simplification of ETs, using heuristic approaches to select an optimal reduced set of decision rules. In this paper, we introduce a novel step to those methodologies. We create a new component to predict if a given rule will be correct or not for a particular patient, which introduces personalization into the procedure. Furthermore, the validation results using three public clinical datasets show that it also allows to increase the predictive performance of the selected set of rules, improving the mentioned trade-off.
Machine learning models have had discernible achievements in a myriad of applications. However, most of these models are black-boxes, and it is obscure how the decisions are made by them. This makes the models unreliable and untrustworthy. To provide insights into the decision making processes of these models, a variety of traditional interpretable models have been proposed. Moreover, to generate more human-friendly explanations, recent work on interpretability tries to answer questions related to causality such as Why does this model makes such decisions? or Was it a specific feature that caused the decision made by the model?. In this work, models that aim to answer causal questions are referred to as causal interpretable models. The existing surveys have covered concepts and methodologies of traditional interpretability. In this work, we present a comprehensive survey on causal interpretable models from the aspects of the problems and methods. In addition, this survey provides in-depth insights into the existing evaluation metrics for measuring interpretability, which can help practitioners understand for what scenarios each evaluation metric is suitable.
AI systems that model and interact with users can update their models over time to reflect new information and changes in the environment. Although these updates may improve the overall performance of the AI system, they may actually hurt the performance with respect to individual users. Prior work has studied the trade-off between improving the systems accuracy following an update and the compatibility of the updated system with prior user experience. The more the model is forced to be compatible with a prior version, the higher loss in accuracy it will incur. In this paper, we show that by personalizing the loss function to specific users, in some cases it is possible to improve the compatibility-accuracy trade-off with respect to these users (increase the compatibility of the model while sacrificing less accuracy). We present experimental results indicating that this approach provides moderate improvements on average (around 20%) but large improvements for certain users (up to 300%).
We investigate the influence of adversarial training on the interpretability of convolutional neural networks (CNNs), specifically applied to diagnosing skin cancer. We show that gradient-based saliency maps of adversarially trained CNNs are significantly sharper and more visually coherent than those of standardly trained CNNs. Furthermore, we show that adversarially trained networks highlight regions with significant color variation within the lesion, a common characteristic of melanoma. We find that fine-tuning a robust network with a small learning rate further improves saliency maps sharpness. Lastly, we provide preliminary work suggesting that robustifying the first layers to extract robust low-level features leads to visually coherent explanations.
To date, there has been no formal study of the statistical cost of interpretability in machine learning. As such, the discourse around potential trade-offs is often informal and misconceptions abound. In this work, we aim to initiate a formal study of these trade-offs. A seemingly insurmountable roadblock is the lack of any agreed upon definition of interpretability. Instead, we propose a shift in perspective. Rather than attempt to define interpretability, we propose to model the emph{act} of emph{enforcing} interpretability. As a starting point, we focus on the setting of empirical risk minimization for binary classification, and view interpretability as a constraint placed on learning. That is, we assume we are given a subset of hypothesis that are deemed to be interpretable, possibly depending on the data distribution and other aspects of the context. We then model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses. This model allows us to reason about the statistical implications of enforcing interpretability, using known results in statistical learning theory. Focusing on accuracy, we perform a case analysis, explaining why one may or may not observe a trade-off between accuracy and interpretability when the restriction to interpretable classifiers does or does not come at the cost of some excess statistical risk. We close with some worked examples and some open problems, which we hope will spur further theoretical development around the tradeoffs involved in interpretability.
Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this context is that both the quality of interpretability methods as well as trust in ML predictions are difficult to measure. Yet evaluations, comparisons and improvements of trust and interpretability require quantifiable measures. Here we propose a quantitative measure for the quality of interpretability methods. Based on that we derive a quantitative measure of trust in ML decisions. Building on previous work we propose to measure intuitive understanding of algorithmic decisions using the information transfer rate at which humans replicate ML model predictions. We provide empirical evidence from crowdsourcing experiments that the proposed metric robustly differentiates interpretability methods. The proposed metric also demonstrates the value of interpretability for ML assisted human decision making: in our experiments providing explanations more than doubled productivity in annotation tasks. However unbiased human judgement is critical for doctors, judges, policy makers and others. Here we derive a trust metric that identifies when human decisions are overly biased towards ML predictions. Our results complement existing qualitative work on trust and interpretability by quantifiable measures that can serve as objectives for further improving methods in this field of research.