No Arabic abstract
In a noiseless linear estimation problem, one aims to reconstruct a vector x* from the knowledge of its linear projections y=Phi x*. There have been many theoretical works concentrating on the case where the matrix Phi is a random i.i.d. one, but a number of heuristic evidence suggests that many of these results are universal and extend well beyond this restricted case. Here we revisit this problematic through the prism of development of message passing methods, and consider not only the universality of the l1 transition, as previously addressed, but also the one of the optimal Bayesian reconstruction. We observed that the universality extends to the Bayes-optimal minimum mean-squared (MMSE) error, and to a range of structured matrices.
In this manuscript we consider Kernel Ridge Regression (KRR) under the Gaussian design. Exponents for the decay of the excess generalization error of KRR have been reported in various works under the assumption of power-law decay of eigenvalues of the features co-variance. These decays were, however, provided for sizeably different setups, namely in the noiseless case with constant regularization and in the noisy optimally regularized case. Intermediary settings have been left substantially uncharted. In this work, we unify and extend this line of work, providing characterization of all regimes and excess error decay rates that can be observed in terms of the interplay of noise and regularization. In particular, we show the existence of a transition in the noisy setting between the noiseless exponents to its noisy values as the sample complexity is increased. Finally, we illustrate how this crossover can also be observed on real data sets.
We study the problem of recovering an unknown signal $boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $hat{boldsymbol x}^{rm L}$ and a spectral estimator $hat{boldsymbol x}^{rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $hat{boldsymbol x}^{rm L}$ and $hat{boldsymbol x}^{rm s}$. At the heart of our analysis is the exact characterization of the joint empirical distribution of $(boldsymbol x, hat{boldsymbol x}^{rm L}, hat{boldsymbol x}^{rm s})$ in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $hat{boldsymbol x}^{rm L}$ and $hat{boldsymbol x}^{rm s}$, given the limiting distribution of the signal $boldsymbol x$. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $thetahat{boldsymbol x}^{rm L}+hat{boldsymbol x}^{rm s}$ and we derive the optimal combination coefficient. In order to establish the limiting distribution of $(boldsymbol x, hat{boldsymbol x}^{rm L}, hat{boldsymbol x}^{rm s})$, we design and analyze an Approximate Message Passing (AMP) algorithm whose iterates give $hat{boldsymbol x}^{rm L}$ and approach $hat{boldsymbol x}^{rm s}$. Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.
We consider the problem of estimating a signal from measurements obtained via a generalized linear model. We focus on estimators based on approximate message passing (AMP), a family of iterative algorithms with many appealing features: the performance of AMP in the high-dimensional limit can be succinctly characterized under suitable model assumptions; AMP can also be tailored to the empirical distribution of the signal entries, and for a wide class of estimation problems, AMP is conjectured to be optimal among all polynomial-time algorithms. However, a major issue of AMP is that in many models (such as phase retrieval), it requires an initialization correlated with the ground-truth signal and independent from the measurement matrix. Assuming that such an initialization is available is typically not realistic. In this paper, we solve this problem by proposing an AMP algorithm initialized with a spectral estimator. With such an initialization, the standard AMP analysis fails since the spectral estimator depends in a complicated way on the design matrix. Our main contribution is a rigorous characterization of the performance of AMP with spectral initialization in the high-dimensional limit. The key technical idea is to define and analyze a two-phase artificial AMP algorithm that first produces the spectral estimator, and then closely approximates the iterates of the true AMP. We also provide numerical results that demonstrate the validity of the proposed approach.
Recently, Gnutzmann and Smilansky presented a formula for the bond scattering matrix of a graph with respect to a Hermitian matrix. We present another proof for this Gnutzmann and Smilanskys formula by a technique used in the zeta function of a graph. Furthermore, we generalize Gnutzmann and Smilanskys formula to a regular covering of a graph. Finally, we define an $L$-fuction of a graph, and present a determinant expression. As a corollary, we express the generalization of Gnutzmann and Smilanskys formula to a regular covering of a graph by using its $L$-functions.
The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We consider the case of stronger structural assumptions that are commonly satisfied in modern machine learning, including representation learning and generative modelling with autoencoder architectures. Under these assumptions we propose and study an estimator that can be easily implemented, works well in high dimensions, and enjoys faster rates of convergence. We verify the behavior of our estimator empirically in both synthetic and real-data experiments, and discuss its direct implications for total correlation, entropy, and mutual information estimation.