Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Mutual Information in Rank-One Matrix Estimation

230 0 0.0 ( 0 )

Download Cite

Added by Florent Krzakala

Publication date 2016

fields Informatics Engineering Physics

and research's language is English

Authors Florent Krzakala - Jiaming Xu - Lenka Zdeborova

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the estimation of a n-dimensional vector x from the knowledge of noisy and possibility non-linear element-wise measurements of xxT , a very generic problem that contains, e.g. stochastic 2-block model, submatrix localization or the spike perturbation of random matrices. We use an interpolation method proposed by Guerra and later refined by Korada and Macris. We prove that the Bethe mutual information (related to the Bethe free energy and conjectured to be exact by Lesieur et al. on the basis of the non-rigorous cavity method) always yields an upper bound to the exact mutual information. We also provide a lower bound using a similar technique. For concreteness, we illustrate our findings on the sparse PCA problem, and observe that (a) our bounds match for a large region of parameters and (b) that it exists a phase transition in a region where the spectum remains uninformative. While we present only the case of rank-one symmetric matrix estimation, our proof technique is readily extendable to low-rank symmetric matrix or low-rank symmetric tensor estimation

rate research

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

112 - Jean Barbier , Mohamad Dia , Nicolas Macris 2016

Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available.

Information Theory Disordered Systems and Neural Networks Machine Learning

Cost-Efficient RIS-Aided Channel Estimation via Rank-One Matrix Factorization

177 - Wei Zhang , Wee Peng Tay 2021

A reconfigurable intelligent surface (RIS) consists of massive meta elements, which can improve the performance of future wireless communication systems. Existing RIS-aided channel estimation methods try to estimate the cascaded channel directly, incurring high computational and training overhead especially when the number of elements of RIS is extremely large. In this paper, we propose a cost-efficient channel estimation method via rank-one matrix factorization (MF). Specifically, if the RIS is employed near base station (BS), it is found that the RIS- aided channel can be factorized into a product of low-dimensional matrices. To estimate these factorized matrices, we propose alternating minimization and gradient descent approaches to obtain the near optimal solutions. Compared to directly estimating the cascaded channel, the proposed MF method reduces training overhead substantially. Finally, the numerical simulations show the effectiveness of the proposed MF method.

Information Theory Signal Processing Information Theory

Neural Entropic Estimation: A faster path to mutual information estimation

314 - Chung Chan , Ali Al-Bashabsheh , Hing Pang Huang 2019

We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence.

Information Theory Machine Learning Information Theory

Scalable Mutual Information Estimation using Dependence Graphs

133 - Morteza Noshad , Yu Zeng , Alfred O. Hero III 2018

The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in information theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve parametric MSE convergence rate. However, most of the previously proposed estimators have the high computational complexity of at least $O(N^2)$. We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in $mathbb{R}^d$ based on $N$ i.i.d. samples. The reduced complexity MI estimator, called the ensemble dependency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves optimal computational complexity $O(N)$, and can achieve the optimal parametric MSE rate of $O(1/N)$ if the density is $d$ times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve parametric MSE rates with linear time complexity. We illustrate the utility of EDGE for the analysis of the information plane (IP) in deep learning. Using EDGE we shed light on a controversy on whether or not the compression property of information bottleneck (IB) in fact holds for ReLu and other rectification functions in deep neural networks (DNN).

Information Theory Information Theory Machine Learning

Mutual information is copula entropy

595 - Jian Ma , Zengqi Sun 2008

We prove that mutual information is actually negative copula entropy, based on which a method for mutual information estimation is proposed.

Information Theory Machine Learning Information Theory

comments

Fetching comments

Qasyoun Private University For Science And Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Mutual Information in Rank-One Matrix Estimation

Ask ChatGPT about the research

No Arabic abstract

Read More