ترغب بنشر مسار تعليمي؟ اضغط هنا

Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator

234   0   0.0 ( 0 )
 نشر من قبل Zhenyue Qin
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Mutual information is widely applied to learn latent representations of observations, whilst its implication in classification neural networks remain to be better explained. We show that optimising the parameters of classification neural networks with softmax cross-entropy is equivalent to maximising the mutual information between inputs and labels under the balanced data assumption. Through experiments on synthetic and real datasets, we show that softmax cross-entropy can estimate mutual information approximately. When applied to image classification, this relation helps approximate the point-wise mutual information between an input image and a label without modifying the network structure. To this end, we propose infoCAM, informative class activation map, which highlights regions of the input image that are the most relevant to a given label based on differences in information. The activation map helps localise the target object in an input image. Through experiments on the semi-supervised object localisation task with two real-world datasets, we evaluate the effectiveness of our information-theoretic approach.



قيم البحث

اقرأ أيضاً

Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset is balanc ed, training a neural network with cross-entropy maximises the mutual information between inputs and labels through a variational form of mutual information. Thereby, we develop a new form of softmax that also converts a classifier to a mutual information evaluator when the dataset is imbalanced. Experimental results show that the new form leads to better classification accuracy, in particular for imbalanced datasets.
Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data coul d be costly, we focus on better utilizing the given data by inducing the regions with high sample density in the feature space, which could lead to locally sufficient samples for robust learning. We first formally show that the softmax cross-entropy (SCE) loss and its variants convey inappropriate supervisory signals, which encourage the learned feature points to spread over the space sparsely in training. This inspires us to propose the Max-Mahalanobis center (MMC) loss to explicitly induce dense feature regions in order to benefit robustness. Namely, the MMC loss encourages the model to concentrate on learning ordered and compact representations, which gather around the preset optimal centers for different classes. We empirically demonstrate that applying the MMC loss can significantly improve robustness even under strong adaptive attacks, while keeping state-of-the-art accuracy on clean inputs with little extra computation compared to the SCE loss.
Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy los s for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for convoluted sample-mining heuristics. Our experiments over four standard DML benchmarks strongly support our findings. We obtain state-of-the-art results, outperforming recent and complex DML methods.
Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks as adaptation to the new data leads to a drastic decrease of the performance on the old classes and tasks. By using a small memory for re hearsal and knowledge distillation, recent methods have proven to be effective to mitigate catastrophic forgetting. However due to the limited size of the memory, large imbalance between the amount of data available for the old and new classes still remains which results in a deterioration of the overall accuracy of the model. To address this problem, we propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances while also decreasing the computational cost of the training procedure in some cases. Complete experiments on the competitive ImageNet, subImageNet and CIFAR100 datasets show states-of-the-art results.
350 - Zhenyue Qin , Dongwoo Kim 2019
Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework t hat incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information-theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا