Estimators for mutual information are typically biased. However, in the case of the Kozachenko-Leonenko estimator for metric spaces, a type of nearest neighbour estimator, it is possible to calculate the bias explicitly.
The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in information theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve parametric MSE convergence rate. However, most of the previously proposed estimators have the high computational complexity of at least $O(N^2)$. We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in $mathbb{R}^d$ based on $N$ i.i.d. samples. The reduced complexity MI estimator, called the ensemble dependency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves optimal computational complexity $O(N)$, and can achieve the optimal parametric MSE rate of $O(1/N)$ if the density is $d$ times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve parametric MSE rates with linear time complexity. We illustrate the utility of EDGE for the analysis of the information plane (IP) in deep learning. Using EDGE we shed light on a controversy on whether or not the compression property of information bottleneck (IB) in fact holds for ReLu and other rectification functions in deep neural networks (DNN).
To provide an efficient approach to characterize the input-output mutual information (MI) under additive white Gaussian noise (AWGN) channel, this short report fits the curves of exact MI under multilevel quadrature amplitude modulation (M-QAM) signal inputs via multi-exponential decay curve fitting (M-EDCF). Even though the definition expression for instanious MI versus Signal to Noise Ratio (SNR) is complex and the containing integral is intractable, our new developed fitting formula holds a neat and compact form, which possesses high precision as well as low complexity. Generally speaking, this approximation formula of MI can promote the research of performance analysis in practical communication system under discrete inputs.
A new method to measure nonlinear dependence between two variables is described using mutual information to analyze the separate linear and nonlinear components of dependence. This technique, which gives an exact value for the proportion of linear dependence, is then compared with another common test for linearity, the Brock, Dechert and Scheinkman (BDS) test.
We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence.
Deep learning based physical layer design, i.e., using dense neural networks as encoders and decoders, has received considerable interest recently. However, while such an approach is naturally training data-driven, actions of the wireless channel are mimicked using standard channel models, which only partially reflect the physical ground truth. Very recently, neural network based mutual information (MI) estimators have been proposed that directly extract channel actions from the input-output measurements and feed these outputs into the channel encoder. This is a promising direction as such a new design paradigm is fully adaptive and training data-based. This paper implements further recent improvements of such MI estimators, analyzes theoretically their suitability for the channel coding problem, and compares their performance. To this end, a new MI estimator using a emph{``reverse Jensen} approach is proposed.