No Arabic abstract
Based on the notion of information bottleneck (IB), we formulate a quantization problem called IB quantization. We show that IB quantization is equivalent to learning based on the IB principle. Under this equivalence, the standard neural network models can be viewed as scalar (single sample) IB quantizers. It is known, from conventional rate-distortion theory, that scalar quantizers are inferior to vector (multi-sample) quantizers. Such a deficiency then inspires us to develop a novel learning framework, AgrLearn, that corresponds to vector IB quantizers for learning with neural networks. Unlike standard networks, AgrLearn simultaneously optimizes against multiple data samples. We experimentally verify that AgrLearn can result in significant improvements when applied to several current deep learning architectures for image recognition and text classification. We also empirically show that AgrLearn can reduce up to 80% of the training samples needed for ResNet training.
We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call IB learning. We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a vector quantization approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, Aggregated Learning, for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.
A framework is presented for unsupervised learning of representations based on infomax principle for large-scale neural populations. We use an asymptotic approximation to the Shannons mutual information for a large neural population to demonstrate that a good initial approximation to the global information-theoretic optimum can be obtained by a hierarchical infomax method. Starting from the initial solution, an efficient algorithm based on gradient descent of the final objective function is proposed to learn representations from the input datasets, and the method works for complete, overcomplete, and undercomplete bases. As confirmed by numerical experiments, our method is robust and highly efficient for extracting salient features from input datasets. Compared with the main existing methods, our algorithm has a distinct advantage in both the training speed and the robustness of unsupervised representation learning. Furthermore, the proposed method is easily extended to the supervised or unsupervised model for training deep structure networks.
One of the key challenges in training Spiking Neural Networks (SNNs) is that target outputs typically come in the form of natural signals, such as labels for classification or images for generative models, and need to be encoded into spikes. This is done by handcrafting target spiking signals, which in turn implicitly fixes the mechanisms used to decode spikes into natural signals, e.g., rate decoding. The arbitrary choice of target signals and decoding rule generally impairs the capacity of the SNN to encode and process information in the timing of spikes. To address this problem, this work introduces a hybrid variational autoencoder architecture, consisting of an encoding SNN and a decoding Artificial Neural Network (ANN). The role of the decoding ANN is to learn how to best convert the spiking signals output by the SNN into the target natural signal. A novel end-to-end learning rule is introduced that optimizes a directed information bottleneck training criterion via surrogate gradients. We demonstrate the applicability of the technique in an experimental settings on various tasks, including real-life datasets.
In the present paper, we propose the model of {it structural information learning machines} (SiLeM for short), leading to a mathematical definition of learning by merging the theories of computation and information. Our model shows that the essence of learning is {it to gain information}, that to gain information is {it to eliminate uncertainty} embedded in a data space, and that to eliminate uncertainty of a data space can be reduced to an optimization problem, that is, an {it information optimization problem}, which can be realized by a general {it encoding tree method}. The principle and criterion of the structural information learning machines are maximization of {it decoding information} from the data points observed together with the relationships among the data points, and semantical {it interpretation} of syntactical {it essential structure}, respectively. A SiLeM machine learns the laws or rules of nature. It observes the data points of real world, builds the {it connections} among the observed data and constructs a {it data space}, for which the principle is to choose the way of connections of data points so that the {it decoding information} of the data space is maximized, finds the {it encoding tree} of the data space that minimizes the dynamical uncertainty of the data space, in which the encoding tree is hence referred to as a {it decoder}, due to the fact that it has already eliminated the maximum amount of uncertainty embedded in the data space, interprets the {it semantics} of the decoder, an encoding tree, to form a {it knowledge tree}, extracts the {it remarkable common features} for both semantical and syntactical features of the modules decoded by a decoder to construct {it trees of abstractions}, providing the foundations for {it intuitive reasoning} in the learning when new data are observed.
We propose a new approach to train a variational information bottleneck (VIB) that improves its robustness to adversarial perturbations. Unlike the traditional methods where the hard labels are usually used for the classification task, we refine the categorical class information in the training phase with soft labels which are obtained from a pre-trained reference neural network and can reflect the likelihood of the original class labels. We also relax the Gaussian posterior assumption in the VIB implementation by using the mutual information neural estimation. Extensive experiments have been performed with the MNIST and CIFAR-10 datasets, and the results show that our proposed approach significantly outperforms the benchmarked models.