Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks

351 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhenyue Qin

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhenyue Qin - Dongwoo Kim

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework that incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information-theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.

قيم البحث

233 - Zhenyue Qin , Dongwoo Kim , Tom Gedeon 2019

Mutual information is widely applied to learn latent representations of observations, whilst its implication in classification neural networks remain to be better explained. We show that optimising the parameters of classification neural networks wit h softmax cross-entropy is equivalent to maximising the mutual information between inputs and labels under the balanced data assumption. Through experiments on synthetic and real datasets, we show that softmax cross-entropy can estimate mutual information approximately. When applied to image classification, this relation helps approximate the point-wise mutual information between an input image and a label without modifying the network structure. To this end, we propose infoCAM, informative class activation map, which highlights regions of the input image that are the most relevant to a given label based on differences in information. The activation map helps localise the target object in an input image. Through experiments on the semi-supervised object localisation task with two real-world datasets, we evaluate the effectiveness of our information-theoretic approach.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Balanced Meta-Softmax for Long-Tailed Visual Recognition

92 - Jiawei Ren , Cunjun Yu , Shunan Sheng 2020

Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that the Softmax function, though used i n most classification tasks, gives a biased gradient estimation under the long-tailed setup. This paper presents Balanced Softmax, an elegant unbiased extension of Softmax, to accommodate the label distribution shift between training and testing. Theoretically, we derive the generalization bound for multiclass Softmax regression and show our loss minimizes the bound. In addition, we introduce Balanced Meta-Softmax, applying a complementary Meta Sampler to estimate the optimal class sample rate and further improve long-tailed learning. In our experiments, we demonstrate that Balanced Meta-Softmax outperforms state-of-the-art long-tailed classification solutions on both visual recognition and instance segmentation tasks.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

ADMM-SOFTMAX : An ADMM Approach for Multinomial Logistic Regression

169 - Samy Wu Fung , Sanna Tyrvainen , Lars Ruthotto 2019

We present ADMM-Softmax, an alternating direction method of multipliers (ADMM) for solving multinomial logistic regression (MLR) problems. Our method is geared toward supervised classification tasks with many examples and features. It decouples the n onlinear optimization problem in MLR into three steps that can be solved efficiently. In particular, each iteration of ADMM-Softmax consists of a linear least-squares problem, a set of independent small-scale smooth, convex problems, and a trivial dual variable update. Solution of the least-squares problem can be be accelerated by pre-computing a factorization or preconditioner, and the separability in the smooth, convex problem can be easily parallelized across examples. For two image classification problems, we demonstrate that ADMM-Softmax leads to improved generalization compared to a Newton-Krylov, a quasi Newton, and a stochastic gradient descent method.

التعلم الآلي التعلم الالي

Balanced Softmax Cross-Entropy for Incremental Learning

93 - Quentin Jodelet , Xin Liu , Tsuyoshi Murata 2021

Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks as adaptation to the new data leads to a drastic decrease of the performance on the old classes and tasks. By using a small memory for re hearsal and knowledge distillation, recent methods have proven to be effective to mitigate catastrophic forgetting. However due to the limited size of the memory, large imbalance between the amount of data available for the old and new classes still remains which results in a deterioration of the overall accuracy of the model. To address this problem, we propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances while also decreasing the computational cost of the training procedure in some cases. Complete experiments on the competitive ImageNet, subImageNet and CIFAR100 datasets show states-of-the-art results.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

An Information Theoretic Interpretation to Deep Neural Networks

174 - Shao-Lun Huang , Xiangxiang Xu , Lizhong Zheng 2019

It is commonly believed that the hidden layers of deep neural networks (DNNs) attempt to extract informative features for learning tasks. In this paper, we formalize this intuition by showing that the features extracted by DNN coincide with the resul t of an optimization problem, which we call the `universal feature selection problem, in a local analysis regime. We interpret the weights training in DNN as the projection of feature functions between feature spaces, specified by the network structure. Our formulation has direct operational meaning in terms of the performance for inference tasks, and gives interpretations to the internal computation results of DNNs. Results of numerical experiments are provided to support the analysis.

نظرية المعلومات التعلم الآلي نظرية المعلومات