Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

91 0 0.0 ( 0 )

Download Cite

Added by Avanti Shrikumar

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Avanti Shrikumar - Peyton Greenside - Anna Shcherbina

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported black box nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its reference activation and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.

rate research

Learning Important Features Through Propagating Activation Differences

61 - Avanti Shrikumar , Peyton Greenside , Anshul Kundaje 2017

The purported black box nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its reference activation and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH.

Computer Vision and Pattern Recognition Machine Learning Neural and Evolutionary Computing

What it Thinks is Important is Important: Robustness Transfers through Input Gradients

198 - Alvin Chan , Yi Tay , Yew-Soon Ong 2019

Adversarial perturbations are imperceptible changes to input pixels that can change the prediction of deep learning models. Learned weights of models robust to such perturbations are previously found to be transferable across different tasks but this applies only if the model architecture for the source and target tasks is the same. Input gradients characterize how small changes at each input pixel affect the model output. Using only natural images, we show here that training a student models input gradients to match those of a robust teacher model can gain robustness close to a strong baseline that is robustly trained from scratch. Through experiments in MNIST, CIFAR-10, CIFAR-100 and Tiny-ImageNet, we show that our proposed method, input gradient adversarial matching, can transfer robustness across different tasks and even across different model architectures. This demonstrates that directly targeting the semantics of input gradients is a feasible way towards adversarial robustness.

Machine Learning Computer Vision and Pattern Recognition Neural and Evolutionary Computing

MetricOpt: Learning to Optimize Black-Box Evaluation Metrics

122 - Chen Huang , Shuangfei Zhai , Pengsheng Guo 2021

We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown. We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations. The learned value function is easily pluggable into existing optimizers like SGD and Adam, and is effective for rapidly finetuning a pre-trained model. This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision. MetricOpt achieves state-of-the-art performance on a variety of metrics for (image) classification, image retrieval and object detection. Solid benefits are found over competing methods, which often involve complex loss design or adaptation. MetricOpt also generalizes well to new tasks and model architectures.

Machine Learning Computer Vision and Pattern Recognition

EIS -- a family of activation functions combining Exponential, ISRU, and Softplus

117 - Koushik Biswas , Sandeep Kumar , Shilpak Banerjee 2020

Activation functions play a pivotal role in the function learning using neural networks. The non-linearity in the learned function is achieved by repeated use of the activation function. Over the years, numerous activation functions have been proposed to improve accuracy in several tasks. Basic functions like ReLU, Sigmoid, Tanh, or Softplus have been favorite among the deep learning community because of their simplicity. In recent years, several novel activation functions arising from these basic functions have been proposed, which have improved accuracy in some challenging datasets. We propose a five hyper-parameters family of activation functions, namely EIS, defined as, [ frac{x(ln(1+e^x))^alpha}{sqrt{beta+gamma x^2}+delta e^{-theta x}}. ] We show examples of activation functions from the EIS family which outperform widely used activation functions on some well known datasets and models. For example, $frac{xln(1+e^x)}{x+1.16e^{-x}}$ beats ReLU by 0.89% in DenseNet-169, 0.24% in Inception V3 in CIFAR100 dataset while 1.13% in Inception V3, 0.13% in DenseNet-169, 0.94% in SimpleNet model in CIFAR10 dataset. Also, $frac{xln(1+e^x)}{sqrt{1+x^2}}$ beats ReLU by 1.68% in DenseNet-169, 0.30% in Inception V3 in CIFAR100 dataset while 1.0% in Inception V3, 0.15% in DenseNet-169, 1.13% in SimpleNet model in CIFAR10 dataset.

Machine Learning Computer Vision and Pattern Recognition Neural and Evolutionary Computing

Reproducing Activation Function for Deep Learning

239 - Senwei Liang , Liyao Lyu , Chunmei Wang 2021

We propose reproducing activation functions (RAFs) to improve deep learning accuracy for various applications ranging from computer vision to scientific computing. The idea is to employ several basic functions and their learnable linear combination to construct neuron-wise data-driven activation functions for each neuron. Armed with RAFs, neural networks (NNs) can reproduce traditional approximation tools and, therefore, approximate target functions with a smaller number of parameters than traditional NNs. In NN training, RAFs can generate neural tangent kernels (NTKs) with a better condition number than traditional activation functions lessening the spectral bias of deep learning. As demonstrated by extensive numerical tests, the proposed RAFs can facilitate the convergence of deep learning optimization for a solution with higher accuracy than existing deep learning solvers for audio/image/video reconstruction, PDEs, and eigenvalue problems. With RAFs, the errors of audio/video reconstruction, PDEs, and eigenvalue problems are decreased by over 14%, 73%, 99%, respectively, compared with baseline, while the performance of image reconstruction increases by 58%.

Machine Learning Computer Vision and Pattern Recognition Numerical Analysis

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions