Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

On Predictive Information in RNNs

92 0 0.0 ( 0 )

Download Cite

Added by Zhe Dong

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Zhe Dong - Deniz Oktay - Ben Poole

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future. In this work, we investigate if the same can be said of artificial neurons in recurrent neural networks (RNNs) trained with maximum likelihood. Empirically, we find that RNNs are suboptimal in the information plane. Instead of optimally compressing past information, they extract additional information that is not relevant for predicting the future. We show that constraining past information by injecting noise into the hidden state can improve RNNs in several ways: optimality in the predictive information plane, sample quality, heldout likelihood, and downstream classification performance.

rate research

Self-supervised Representation Learning with Relative Predictive Coding

99 - Yao-Hung Hubert Tsai , Martin Q. Ma , Muqiao Yang 2021

This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the success of RPC is two-fold. First, RPC introduces the relative parameters to regularize the objective for boundedness and low variance. Second, RPC contains no logarithm and exponential score functions, which are the main cause of training instability in prior contrastive objectives. We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks. Lastly, we relate RPC with mutual information (MI) estimation, showing RPC can be used to estimate MI with low variance.

Machine Learning Information Theory Information Theory

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

295 - Mahdi Haghifam , Gintare Karolina Dziugaite , Shay Moran 2020

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be bounded by any real-valued function of the VC dimension only.

Machine Learning Information Theory Information Theory

Disentangled Information Bottleneck

90 - Ziqi Pan , Li Niu , Jianfu Zhang 2020

The information bottleneck (IB) method is a technique for extracting information that is relevant for predicting the target random variable from the source random variable, which is typically implemented by optimizing the IB Lagrangian that balances the compression and prediction terms. However, the IB Lagrangian is hard to optimize, and multiple trials for tuning values of Lagrangian multiplier are required. Moreover, we show that the prediction performance strictly decreases as the compression gets stronger during optimizing the IB Lagrangian. In this paper, we implement the IB method from the perspective of supervised disentangling. Specifically, we introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss (maximum compression). Theoretical and experimental results demonstrate that our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.

Machine Learning Information Theory Information Theory

Information Potential Auto-Encoders

123 - Yan Zhang , Mete Ozay , Zhun Sun 2017

In this paper, we suggest a framework to make use of mutual information as a regularization criterion to train Auto-Encoders (AEs). In the proposed framework, AEs are regularized by minimization of the mutual information between input and encoding variables of AEs during the training phase. In order to estimate the entropy of the encoding variables and the mutual information, we propose a non-parametric method. We also give an information theoretic view of Variational AEs (VAEs), which suggests that VAEs can be considered as parametric methods that estimate entropy. Experimental results show that the proposed non-parametric models have more degree of freedom in terms of representation learning of features drawn from complex distributions such as Mixture of Gaussians, compared to methods which estimate entropy using parametric approaches, such as Variational AEs.

Machine Learning Information Theory Information Theory

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

79 - Ravid Shwartz-Ziv , Alexander A. Alemi 2019

In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks. Amazingly, this model family admits tractable calculations for many information-theoretic quantities. We report analytical and empirical investigations in the search for signals that correlate with generalization.

Machine Learning Information Theory Information Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On Predictive Information in RNNs

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions