GAM: Explainable Visual Similarity and Classification via Gradient Activation Maps

239 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Amir Hertz

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Oren Barkan - Omri Armstrong - Amir Hertz

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present Gradient Activation Maps (GAM) - a machinery for explaining predictions made by visual similarity and classification models. By gleaning localized gradient and activation information from multiple network layers, GAM offers improved visual explanations, when compared to existing alternatives. The algorithmic advantages of GAM are explained in detail, and validated empirically, where it is shown that GAM outperforms its alternatives across various tasks and datasets.

قيم البحث

69 - Satya M. Muddamsetty , Mohammad N. S. Jahromi , Thomas B. Moeslund 2020

A new brand of technical artificial intelligence ( Explainable AI ) research has focused on trying to open up the black box and provide some explainability. This paper presents a novel visual explanation method for deep learning networks in the form of a saliency map that can effectively localize entire object regions. In contrast to the current state-of-the art methods, the proposed method shows quite promising visual explanations that can gain greater trust of human expert. Both quantitative and qualitative evaluations are carried out on both general and clinical data sets to confirm the effectiveness of the proposed method.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

U-CAM: Visual Explanation using Uncertainty based Class Activation Maps

122 - Badri N. Patro , Mayank Lunayach , Shivansh Patel 2019

Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering t ask. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanation for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة التعلم الآلي

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

96 - Ramprasaath R. Selvaraju , Michael Cogswell , Abhishek Das 2016

We propose a technique for producing visual explanations for decisions from a large class of CNN-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concep t, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept. Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers, (2) CNNs used for structured outputs, (3) CNNs used in tasks with multimodal inputs or reinforcement learning, without any architectural changes or re-training. We combine Grad-CAM with fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to off-the-shelf image classification, captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into their failure modes, (b) are robust to adversarial images, (c) outperform previous methods on localization, (d) are more faithful to the underlying model and (e) help achieve generalization by identifying dataset bias. For captioning and VQA, we show that even non-attention based models can localize inputs. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM helps users establish appropriate trust in predictions from models and show that Grad-CAM helps untrained users successfully discern a stronger nodel from a weaker one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo at http://gradcam.cloudcv.org, and a video at youtu.be/COjUB9Izk6E.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

ProtoPShare: Prototype Sharing for Interpretable Image Classification and Similarity Discovery

129 - Dawid Rymarczyk , {L}ukasz Struski , Jacek Tabor 2020

In this paper, we introduce ProtoPShare, a self-explained method that incorporates the paradigm of prototypical parts to explain its predictions. The main novelty of the ProtoPShare is its ability to efficiently share prototypical parts between the c lasses thanks to our data-dependent merge-pruning. Moreover, the prototypes are more consistent and the model is more robust to image perturbations than the state of the art method ProtoPNet. We verify our findings on two datasets, the CUB-200-2011 and the Stanford Cars.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Calibrating Class Activation Maps for Long-Tailed Visual Recognition

141 - Chi Zhang , Guosheng Lin , Lvlong Lai 2021

Real-world visual recognition problems often exhibit long-tailed distributions, where the amount of data for learning in different categories shows significant imbalance. Standard classification models learned on such data distribution often make bia sed predictions towards the head classes while generalizing poorly to the tail classes. In this paper, we present two effective modifications of CNNs to improve network learning from long-tailed distribution. First, we present a Class Activation Map Calibration (CAMC) module to improve the learning and prediction of network classifiers, by enforcing network prediction based on important image regions. The proposed CAMC module highlights the correlated image regions across data and reinforces the representations in these areas to obtain a better global representation for classification. Furthermore, we investigate the use of normalized classifiers for representation learning in long-tailed problems. Our empirical study demonstrates that by simply scaling the outputs of the classifier with an appropriate scalar, we can effectively improve the classification accuracy on tail classes without losing the accuracy of head classes. We conduct extensive experiments to validate the effectiveness of our design and we set new state-of-the-art performance on five benchmarks, including ImageNet-LT, Places-LT, iNaturalist 2018, CIFAR10-LT, and CIFAR100-LT.

الرؤية الحاسوبية وتمييز الأنماط