We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster
method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence.
We consider the estimation of a n-dimensional vector x from the knowledge of noisy and possibility non-linear element-wise measurements of xxT , a very generic problem that contains, e.g. stochastic 2-block model, submatrix localization or the spike
perturbation of random matrices. We use an interpolation method proposed by Guerra and later refined by Korada and Macris. We prove that the Bethe mutual information (related to the Bethe free energy and conjectured to be exact by Lesieur et al. on the basis of the non-rigorous cavity method) always yields an upper bound to the exact mutual information. We also provide a lower bound using a similar technique. For concreteness, we illustrate our findings on the sparse PCA problem, and observe that (a) our bounds match for a large region of parameters and (b) that it exists a phase transition in a region where the spectum remains uninformative. While we present only the case of rank-one symmetric matrix estimation, our proof technique is readily extendable to low-rank symmetric matrix or low-rank symmetric tensor estimation
To provide an efficient approach to characterize the input-output mutual information (MI) under additive white Gaussian noise (AWGN) channel, this short report fits the curves of exact MI under multilevel quadrature amplitude modulation (M-QAM) signa
l inputs via multi-exponential decay curve fitting (M-EDCF). Even though the definition expression for instanious MI versus Signal to Noise Ratio (SNR) is complex and the containing integral is intractable, our new developed fitting formula holds a neat and compact form, which possesses high precision as well as low complexity. Generally speaking, this approximation formula of MI can promote the research of performance analysis in practical communication system under discrete inputs.
We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks. A graph $G$ from such a model is generated by first assigning vertex labels at random from a finite
alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, we establish an explicit `single-letter characterization of the per-vertex mutual information between the vertex labels and the graph. The explicit expression of the mutual information is intimately related to estimation-theoretic quantities, and --in particular-- reveals a phase transition at the critical point for community detection. Below the critical point the per-vertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the per-vertex mutual information is strictly smaller than the independent-edges upper bound. In this regime there exists a procedure that estimates the vertex labels better than random guessing.
The mutual information between two jointly distributed random variables $X$ and $Y$ is a functional of the joint distribution $P_{XY},$ which is sometimes difficult to handle or estimate. A coarser description of the statistical behavior of $(X,Y)$ i
s given by the marginal distributions $P_X, P_Y$ and the adjacency relation induced by the joint distribution, where $x$ and $y$ are adjacent if $P(x,y)>0$. We derive a lower bound on the mutual information in terms of these entities. The bound is obtained by viewing the channel from $X$ to $Y$ as a probability distribution on a set of possible actions, where an action determines the output for any possible input, and is independently drawn. We also provide an alternative proof based on convex optimization, that yields a generally tighter bound. Finally, we derive an upper bound on the mutual information in terms of adjacency events between the action and the pair $(X,Y)$, where in this case an action $a$ and a pair $(x,y)$ are adjacent if $y=a(x)$. As an example, we apply our bounds to the binary deletion channel and show that for the special case of an i.i.d. input distribution and a range of deletion probabilities, our lower and upper bounds both outperform the best known bounds for the mutual information.