On Predictive Information in RNNs

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhe Dong

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhe Dong - Deniz Oktay - Ben Poole

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future. In this work, we investigate if the same can be said of artificial neurons in recurrent neural networks (RNNs) trained with maximum likelihood. Empirically, we find that RNNs are suboptimal in the information plane. Instead of optimally compressing past information, they extract additional information that is not relevant for predicting the future. We show that constraining past information by injecting noise into the hidden state can improve RNNs in several ways: optimality in the predictive information plane, sample quality, heldout likelihood, and downstream classification performance.

قيم البحث

99 - Yao-Hung Hubert Tsai , Martin Q. Ma , Muqiao Yang 2021

This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the success of RPC is two-fold. First, RPC introduces the relative parameters to regularize the objective for boundedness and low variance. Second, RPC contains no logarithm and exponential score functions, which are the main cause of training instability in prior contrastive objectives. We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks. Lastly, we relate RPC with mutual information (MI) estimation, showing RPC can be used to estimate MI with low variance.

التعلم الآلي نظرية المعلومات نظرية المعلومات

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

295 - Mahdi Haghifam , Gintare Karolina Dziugaite , Shay Moran 2020

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be bounded by any real-valued function of the VC dimension only.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Disentangled Information Bottleneck

90 - Ziqi Pan , Li Niu , Jianfu Zhang 2020

The information bottleneck (IB) method is a technique for extracting information that is relevant for predicting the target random variable from the source random variable, which is typically implemented by optimizing the IB Lagrangian that balances the compression and prediction terms. However, the IB Lagrangian is hard to optimize, and multiple trials for tuning values of Lagrangian multiplier are required. Moreover, we show that the prediction performance strictly decreases as the compression gets stronger during optimizing the IB Lagrangian. In this paper, we implement the IB method from the perspective of supervised disentangling. Specifically, we introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss (maximum compression). Theoretical and experimental results demonstrate that our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Information Potential Auto-Encoders

123 - Yan Zhang , Mete Ozay , Zhun Sun 2017

In this paper, we suggest a framework to make use of mutual information as a regularization criterion to train Auto-Encoders (AEs). In the proposed framework, AEs are regularized by minimization of the mutual information between input and encoding va riables of AEs during the training phase. In order to estimate the entropy of the encoding variables and the mutual information, we propose a non-parametric method. We also give an information theoretic view of Variational AEs (VAEs), which suggests that VAEs can be considered as parametric methods that estimate entropy. Experimental results show that the proposed non-parametric models have more degree of freedom in terms of representation learning of features drawn from complex distributions such as Mixture of Gaussians, compared to methods which estimate entropy using parametric approaches, such as Variational AEs.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

79 - Ravid Shwartz-Ziv , Alexander A. Alemi 2019

In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks. Amazingly, this model family admits tractable calculations for many information-theoretic quantities. We report analytical and empirical investigations in the search for signals that correlate with generalization.

التعلم الآلي نظرية المعلومات نظرية المعلومات