ترغب بنشر مسار تعليمي؟ اضغط هنا

We study how to evaluate the quantitative information content of a region within an image for a particular label. To this end, we bridge class activation maps with information theory. We develop an informative class activation map (infoCAM). Given a classification task, infoCAM depict how to accumulate information of partial regions to that of the entire image toward a label. Thus, we can utilise infoCAM to locate the most informative features for a label. When applied to an image classification task, infoCAM performs better than the traditional classification map in the weakly supervised object localisation task. We achieve state-of-the-art results on Tiny-ImageNet.
Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset is balanc ed, training a neural network with cross-entropy maximises the mutual information between inputs and labels through a variational form of mutual information. Thereby, we develop a new form of softmax that also converts a classifier to a mutual information evaluator when the dataset is imbalanced. Experimental results show that the new form leads to better classification accuracy, in particular for imbalanced datasets.
Most existing graph neural networks (GNNs) learn node embeddings using the framework of message passing and aggregation. Such GNNs are incapable of learning relative positions between graph nodes within a graph. To empower GNNs with the awareness of node positions, some nodes are set as anchors. Then, using the distances from a node to the anchors, GNNs can infer relative positions between nodes. However, P-GNNs arbitrarily select anchors, leading to compromising position-awareness and feature extraction. To eliminate this compromise, we demonstrate that selecting evenly distributed and asymmetric anchors is essential. On the other hand, we show that choosing anchors that can aggregate embeddings of all the nodes within a graph is NP-hard. Therefore, devising efficient optimal algorithms in a deterministic approach is practically not feasible. To ensure position-awareness and bypass NP-completeness, we propose Position-Sensing Graph Neural Networks (PSGNNs), learning how to choose anchors in a back-propagatable fashion. Experiments verify the effectiveness of PSGNNs against state-of-the-art GNNs, substantially improving performance on various synthetic and real-world graph datasets while enjoying stable scalability. Specifically, PSGNNs on average boost AUC more than 14% for pairwise node classification and 18% for link prediction over the existing state-of-the-art position-aware methods. Our source code is publicly available at: https://github.com/ZhenyueQin/PSGNN
The prevalent convolutional neural network (CNN) based image denoising methods extract features of images to restore the clean ground truth, achieving high denoising accuracy. However, these methods may ignore the underlying distribution of clean ima ges, inducing distortions or artifacts in denoising results. This paper proposes a new perspective to treat image denoising as a distribution learning and disentangling task. Since the noisy image distribution can be viewed as a joint distribution of clean images and noise, the denoised images can be obtained via manipulating the latent representations to the clean counterpart. This paper also provides a distribution learning based denoising framework. Following this framework, we present an invertible denoising network, FDN, without any assumptions on either clean or noise distributions, as well as a distribution disentanglement method. FDN learns the distribution of noisy images, which is different from the previous CNN based discriminative mapping. Experimental results demonstrate FDNs capacity to remove synthetic additive white Gaussian noise (AWGN) on both category-specific and remote sensing images. Furthermore, the performance of FDN surpasses that of previously published methods in real image denoising with fewer parameters and faster speed. Our code is available at: https://github.com/Yang-Liu1082/FDN.git.
265 - Zhenyue Qin , Yang Liu , Pan Ji 2021
Skeleton sequences are lightweight and compact, thus are ideal candidates for action recognition on edge devices. Recent skeleton-based action recognition methods extract features from 3D joint coordinates as spatial-temporal cues, using these repres entations in a graph neural network for feature fusion to boost recognition performance. The use of first- and second-order features, ie{} joint and bone representations, has led to high accuracy. Nonetheless, many models are still confused by actions that have similar motion trajectories. To address these issues, we propose fusing third-order features in the form of angular encoding into modern architectures to robustly capture the relationships between joints and body parts. This simple fusion with popular spatial-temporal graph neural networks achieves new state-of-the-art accuracy in two large benchmarks, including NTU60 and NTU120, while employing fewer parameters and reduced run time. Our source code is publicly available at: https://github.com/ZhenyueQin/Angular-Skeleton-Encoding.
Invertible networks have various benefits for image denoising since they are lightweight, information-lossless, and memory-saving during back-propagation. However, applying invertible models to remove noise is challenging because the input is noisy, and the reversed output is clean, following two different distributions. We propose an invertible denoising network, InvDN, to address this challenge. InvDN transforms the noisy input into a low-resolution clean image and a latent representation containing noise. To discard noise and restore the clean image, InvDN replaces the noisy latent representation with another one sampled from a prior distribution during reversion. The denoising performance of InvDN is better than all the existing competitive models, achieving a new state-of-the-art result for the SIDD dataset while enjoying less run time. Moreover, the size of InvDN is far smaller, only having 4.2% of the number of parameters compared to the most recently proposed DANet. Further, via manipulating the noisy latent representation, InvDN is also able to generate noise more similar to the original one. Our code is available at: https://github.com/Yang-Liu1082/InvDN.git.
Ever since the advent of AlexNet, designing novel deep neural architectures for different tasks has consistently been a productive research direction. Despite the exceptional performance of various architectures in practice, we study a theoretical qu estion: what is the condition for deep neural architectures to preserve all the information of the input data? Identifying the information lossless condition for deep neural architectures is important, because tasks such as image restoration require keep the detailed information of the input data as much as possible. Using the definition of mutual information, we show that: a deep neural architecture can preserve maximum details about the given data if and only if the architecture is invertible. We verify the advantages of our Invertible Restoring Autoencoder (IRAE) network by comparing it with competitive models on three perturbed image restoration tasks: image denoising, jpeg image decompression and image inpainting. Experimental results show that IRAE consistently outperforms non-invertible ones. Our model even contains far fewer parameters. Thus, it may be worthwhile to try replacing standard components of deep neural architectures, such as residual blocks and ReLU, with their invertible counterparts. We believe our work provides a unique perspective and direction for future deep learning research.
Mutual information is widely applied to learn latent representations of observations, whilst its implication in classification neural networks remain to be better explained. We show that optimising the parameters of classification neural networks wit h softmax cross-entropy is equivalent to maximising the mutual information between inputs and labels under the balanced data assumption. Through experiments on synthetic and real datasets, we show that softmax cross-entropy can estimate mutual information approximately. When applied to image classification, this relation helps approximate the point-wise mutual information between an input image and a label without modifying the network structure. To this end, we propose infoCAM, informative class activation map, which highlights regions of the input image that are the most relevant to a given label based on differences in information. The activation map helps localise the target object in an input image. Through experiments on the semi-supervised object localisation task with two real-world datasets, we evaluate the effectiveness of our information-theoretic approach.
350 - Zhenyue Qin , Dongwoo Kim 2019
Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework t hat incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information-theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.
307 - Zhenyue Qin , Jie Wu 2018
Human eyes concentrate different facial regions during distinct cognitive activities. We study utilising facial visual saliency maps to classify different facial expressions into different emotions. Our results show that our novel method of merely us ing facial saliency maps can achieve a descent accuracy of 65%, much higher than the chance level of $1/7$. Furthermore, our approach is of semi-supervision, i.e., our facial saliency maps are generated from a general saliency prediction algorithm that is not explicitly designed for face images. We also discovered that the classification accuracies of each emotional class using saliency maps demonstrate a strong positive correlation with the accuracies produced by face images. Our work implies that humans may look at different facial areas in order to perceive different emotions.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا