New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Rank-one Multi-Reference Factor Analysis

84 0 0.0 ( 0 )

Download Cite

Added by Yariv Aizenbud

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yariv Aizenbud - Boris Landa - Yoel Shkolnisky

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/operatorname{SNR}^4$. Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signals length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings.

rate research

Multi-reference factor analysis: low-rank covariance estimation under unknown translations

301 - Boris Landa , Yoel Shkolnisky 2019

We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length $L$ and follows a standard factor model with mean zero and $r$ normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the $textit{power spectrum}$ and the $textit{trispectrum}$. We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when $r<sqrt{L}$. Correspondingly, we provide a polynomial-time procedure for estimating the covariance matrix from many (translated and noisy) observations, where no explicit knowledge of $r$ is required, and prove the procedures statistical consistency. While our results establish that covariance estimation is possible from the power spectrum and the trispectrum for low-rank covariance matrices, we prove that this is not the case for full-rank covariance matrices. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate the favorable performance of our algorithms in various settings, including in high levels of noise.

Statistics Theory Data Structures and Algorithms Information Theory

On the Competitive Analysis and High Accuracy Optimality of Profile Maximum Likelihood

76 - Yanjun Han , Kirankumar Shiragur 2020

A striking result of [Acharya et al. 2017] showed that to estimate symmetric properties of discrete distributions, plugging in the distribution that maximizes the likelihood of observed multiset of frequencies, also known as the profile maximum likelihood (PML) distribution, is competitive compared with any estimators regardless of the symmetric property. Specifically, given $n$ observations from the discrete distribution, if some estimator incurs an error $varepsilon$ with probability at most $delta$, then plugging in the PML distribution incurs an error $2varepsilon$ with probability at most $deltacdot exp(3sqrt{n})$. In this paper, we strengthen the above result and show that using a careful chaining argument, the error probability can be reduced to $delta^{1-c}cdot exp(cn^{1/3+c})$ for arbitrarily small constants $c>0$ and some constant $c>0$. In particular, we show that the PML distribution is an optimal estimator of the sorted distribution: it is $varepsilon$-close in sorted $ell_1$ distance to the true distribution with support size $k$ for any $n=Omega(k/(varepsilon^2 log k))$ and $varepsilon gg n^{-1/3}$, which are the information-theoretically optimal sample complexity and the largest error regime where the classical empirical distribution is sub-optimal, respectively. In order to strengthen the analysis of the PML, a key ingredient is to employ novel continuity properties of the PML distributions and construct a chain of suitable quantized PMLs, or coverings. We also construct a novel approximation-based estimator for the sorted distribution with a near-optimal concentration property without any sample splitting, where as a byproduct we obtain better trade-offs between the polynomial approximation error and the maximum magnitude of coefficients in the Poisson approximation of $1$-Lipschitz functions.

Statistics Theory Data Structures and Algorithms Information Theory

Functional principal components analysis via penalized rank one approximation

130 - Jianhua Z. Huang , Haipeng Shen , Andreas Buja 2008

Two existing approaches to functional principal components analysis (FPCA) are due to Rice and Silverman (1991) and Silverman (1996), both based on maximizing variance but introducing penalization in different ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance under scale transformation of the measurements, the new formulation sheds light on how regularization should be performed for FPCA and suggests an efficient power algorithm for computation; (2) it naturally incorporates spline smoothing of discretized functional data; (3) the connection with smoothing splines also facilitates construction of cross-validation or generalized cross-validation criteria for smoothing parameter selection that allows efficient computation; (4) different smoothing parameters are permitted for different FPCs. The methodology is illustrated with a real data example and a simulation.

Statistics Theory Statistics Theory

Estimation under group actions: recovering orbits from invariants

123 - Afonso S. Bandeira , Ben Blum-Smith , Joe Kileel 2017

Motivated by geometric problems in signal processing, computer vision, and structural biology, we study a class of orbit recovery problems where we observe very noisy copies of an unknown signal, each acted upon by a random element of some group (such as Z/p or SO(3)). The goal is to recover the orbit of the signal under the group action in the high-noise regime. This generalizes problems of interest such as multi-reference alignment (MRA) and the reconstruction problem in cryo-electron microscopy (cryo-EM). We obtain matching lower and upper bounds on the sample complexity of these problems in high generality, showing that the statistical difficulty is intricately determined by the invariant theory of the underlying symmetry group. In particular, we determine that for cryo-EM with noise variance $sigma^2$ and uniform viewing directions, the number of samples required scales as $sigma^6$. We match this bound with a novel algorithm for ab initio reconstruction in cryo-EM, based on invariant features of degree at most 3. We further discuss how to recover multiple molecular structures from heterogeneous cryo-EM samples.

Statistics Theory Data Structures and Algorithms Information Theory

Group testing and local search: is there a computational-statistical gap?

94 - Fotis Iliopoulos , Ilias Zadik 2020

In this work we study the fundamental limits of approximate recovery in the context of group testing. One of the most well-known, theoretically optimal, and easy to implement testing procedures is the non-adaptive Bernoulli group testing problem, where all tests are conducted in parallel, and each item is chosen to be part of any certain test independently with some fixed probability. In this setting, there is an observed gap between the number of tests above which recovery is information theoretically (IT) possible, and the number of tests required by the currently best known efficient algorithms to succeed. Often times such gaps are explained by a phase transition in the landscape of the solution space of the problem (an Overlap Gap Property phase transition). In this paper we seek to understand whether such a phenomenon takes place for Bernoulli group testing as well. Our main contributions are the following: (1) We provide first moment evidence that, perhaps surprisingly, such a phase transition does not take place throughout the regime for which recovery is IT possible. This fact suggests that the model is in fact amenable to local search algorithms ; (2) we prove the complete absence of bad local minima for a part of the hard regime, a fact which implies an improvement over known theoretical results on the performance of efficient algorithms for approximate recovery without false-negatives, and finally (3) we present extensive simulations that strongly suggest that a very simple local algorithm known as Glauber Dynamics does indeed succeed, and can be used to efficiently implement the well-known (theoretically optimal) Smallest Satisfying Set (SSS) estimator.

Statistics Theory Data Structures and Algorithms Information Theory

comments

Fetching comments

Qasyoun Private University For Science And Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Rank-one Multi-Reference Factor Analysis

Ask ChatGPT about the research

No Arabic abstract

Read More