No Arabic abstract
Consider the problem of estimating a low-rank matrix when its entries are perturbed by Gaussian noise. If the empirical distribution of the entries of the spikes is known, optimal estimators that exploit this knowledge can substantially outperform simple spectral approaches. Recent work characterizes the asymptotic accuracy of Bayes-optimal estimators in the high-dimensional limit. In this paper we present a practical algorithm that can achieve Bayes-optimal accuracy above the spectral threshold. A bold conjecture from statistical physics posits that no polynomial-time algorithm achieves optimal error below the same threshold (unless the best estimator is trivial). Our approach uses Approximate Message Passing (AMP) in conjunction with a spectral initialization. AMP algorithms have proved successful in a variety of statistical estimation tasks, and are amenable to exact asymptotic analysis via state evolution. Unfortunately, state evolution is uninformative when the algorithm is initialized near an unstable fixed point, as often happens in low-rank matrix estimation. We develop a new analysis of AMP that allows for spectral initializations. Our main theorem is general and applies beyond matrix estimation. However, we use it to derive detailed predictions for the problem of estimating a rank-one matrix in noise. Special cases of this problem are closely related---via universality arguments---to the network community detection problem for two asymmetric communities. For general rank-one models, we show that AMP can be used to construct confidence intervals and control false discovery rate. We provide illustrations of the general methodology by considering the cases of sparse low-rank matrices and of block-constant low-rank matrices with symmetric blocks (we refer to the latter as to the `Gaussian Block Model).
Sparse Group LASSO (SGL) is a regularized model for high-dimensional linear regression problems with grouped covariates. SGL applies $l_1$ and $l_2$ penalties on the individual predictors and group predictors, respectively, to guarantee sparse effects both on the inter-group and within-group levels. In this paper, we apply the approximate message passing (AMP) algorithm to efficiently solve the SGL problem under Gaussian random designs. We further use the recently developed state evolution analysis of AMP to derive an asymptotically exact characterization of SGL solution. This allows us to conduct multiple fine-grained statistical analyses of SGL, through which we investigate the effects of the group information and $gamma$ (proportion of $ell_1$ penalty). With the lens of various performance measures, we show that SGL with small $gamma$ benefits significantly from the group information and can outperform other SGL (including LASSO) or regularized models which does not exploit the group information, in terms of the recovery rate of signal, false discovery rate and mean squared error.
In sketched clustering, a dataset of $T$ samples is first sketched down to a vector of modest size, from which the centroids are subsequently extracted. Advantages include i) reduced storage complexity and ii) centroid extraction complexity independent of $T$. For the sketching methodology recently proposed by Keriven, et al., which can be interpreted as a random sampling of the empirical characteristic function, we propose a sketched clustering algorithm based on approximate message passing. Numerical experiments suggest that our approach is more efficient than the state-of-the-art sketched clustering algorithm CL-OMPR (in both computational and sample complexity) and more efficient than k-means++ when $T$ is large.
Compressed sensing (CS) deals with the problem of reconstructing a sparse vector from an under-determined set of observations. Approximate message passing (AMP) is a technique used in CS based on iterative thresholding and inspired by belief propagation in graphical models. Due to the high transmission rate and a high molecular absorption, spreading loss and reflection loss, the discrete-time channel impulse response (CIR) of a typical indoor THz channel is very long and exhibits an approximately sparse characteristic. In this paper, we develop AMP based channel estimation algorithms for indoor THz communications. The performance of these algorithms is compared to the state of the art. We apply AMP with soft- and hard-thresholding. Unlike the common applications in which AMP with hard-thresholding diverges, the properties of the THz channel favor this approach. It is shown that THz channel estimation via hard-thresholding AMP outperforms all previously proposed methods and approaches the oracle based performance closely.
Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behaviors of eigenvectors for a large class of random matrices whose expectations are low-rank, which helps settle the conjecture in Abbe et al. (2014b) that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The key is a first-order approximation of eigenvectors under the $ell_infty$ norm: $$u_k approx frac{A u_k^*}{lambda_k^*},$$ where ${u_k}$ and ${u_k^*}$ are eigenvectors of a random matrix $A$ and its expectation $mathbb{E} A$, respectively. The fact that the approximation is both tight and linear in $A$ facilitates sharp comparisons between $u_k$ and $u_k^*$. In particular, it allows for comparing the signs of $u_k$ and $u_k^*$ even if $| u_k - u_k^*|_{infty}$ is large. The results are further extended to perturbations of eigenspaces, yielding new $ell_infty$-type bounds for synchronization ($mathbb{Z}_2$-spiked Wigner model) and noisy matrix completion.
Approximate message passing (AMP) is a low-cost iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions. However, AMP only applies to independent identically distributed (IID) transform matrices, but may become unreliable for other matrix ensembles, especially for ill-conditioned ones. To handle this difficulty, orthogonal/vector AMP (OAMP/VAMP) was proposed for general right-unitarily-invariant matrices. However, the Bayes-optimal OAMP/VAMP requires high-complexity linear minimum mean square error estimator. To solve the disadvantages of AMP and OAMP/VAMP, this paper proposes a memory AMP (MAMP), in which a long-memory matched filter is proposed for interference suppression. The complexity of MAMP is comparable to AMP. The asymptotic Gaussianity of estimation errors in MAMP is guaranteed by the orthogonality principle. A state evolution is derived to asymptotically characterize the performance of MAMP. Based on the state evolution, the relaxation parameters and damping vector in MAMP are optimized. For all right-unitarily-invariant matrices, the optimized MAMP converges to OAMP/VAMP, and thus is Bayes-optimal if it has a unique fixed point. Finally, simulations are provided to verify the validity and accuracy of the theoretical results.