ReFACTor: Practical Low-Rank Matrix Estimation Under Column-Sparsity

138 0 0.0 ( 0 )

Download Cite

Added by Regev Schweiger

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Matan Gavish - Regev Schweiger - Elior Rahmani

Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Various problems in data analysis and statistical genetics call for recovery of a column-sparse, low-rank matrix from noisy observations. We propose ReFACTor, a simple variation of the classical Truncated Singular Value Decomposition (TSVD) algorithm. In contrast to previous sparse principal component analysis (PCA) algorithms, our algorithm can provably reveal a low-rank signal matrix better, and often significantly better, than the widely used TSVD, making it the algorithm of choice whenever column-sparsity is suspected. Empirically, we observe that ReFACTor consistently outperforms TSVD even when the underlying signal is not sparse, suggesting that it is generally safe to use ReFACTor instead of TSVD and PCA. The algorithm is extremely simple to implement and its running time is dominated by the runtime of PCA, making it as practical as standard principal component analysis.

rate research

Matrix Completion under Low-Rank Missing Mechanism

159 - Xiaojun Mao , Raymond K. W. Wong , Song Xi Chen 2018

Matrix completion is a modern missing data problem where both the missing structure and the underlying parameter are high dimensional. Although missing structure is a key component to any missing data problems, existing matrix completion methods often assume a simple uniform missing mechanism. In this work, we study matrix completion from corrupted data under a novel low-rank missing mechanism. The probability matrix of observation is estimated via a high dimensional low-rank matrix estimation procedure, and further used to complete the target matrix via inverse probabilities weighting. Due to both high dimensional and extreme (i.e., very small) nature of the true probability matrix, the effect of inverse probability weighting requires careful study. We derive optimal asymptotic convergence rates of the proposed estimators for both the observation probabilities and the target matrix.

Machine Learning Machine Learning Statistics Theory

Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation

375 - Huan Gui , Quanquan Gu 2015

We present a unified framework for low-rank matrix estimation with nonconvex penalties. We first prove that the proposed estimator attains a faster statistical rate than the traditional low-rank matrix estimator with nuclear norm penalty. Moreover, we rigorously show that under a certain condition on the magnitude of the nonzero singular values, the proposed estimator enjoys oracle property (i.e., exactly recovers the true rank of the matrix), besides attaining a faster rate. As far as we know, this is the first work that establishes the theory of low-rank matrix estimation with nonconvex penalties, confirming the advantages of nonconvex penalties for matrix completion. Numerical experiments on both synthetic and real world datasets corroborate our theory.

Machine Learning

High dimensional precision matrix estimation under weak sparsity

190 - Zeyu Wu , Cheng Wang , Weidong Liu 2021

In this paper, we estimate the high dimensional precision matrix under the weak sparsity condition where many entries are nearly zero. We study a Lasso-type method for high dimensional precision matrix estimation and derive general error bounds under the weak sparsity condition. The common irrepresentable condition is relaxed and the results are applicable to the weak sparse matrix. As applications, we study the precision matrix estimation for the heavy-tailed data, the non-paranormal data, and the matrix data with the Lasso-type method.

Statistics Theory Methodology Statistics Theory

Low-rank binary matrix approximation in column-sum norm

75 - Fedor V. Fomin , Petr A. Golovach , Fahad Panolan 2019

We consider $ell_1$-Rank-$r$ Approximation over GF(2), where for a binary $mtimes n$ matrix ${bf A}$ and a positive integer $r$, one seeks a binary matrix ${bf B}$ of rank at most $r$, minimizing the column-sum norm $||{bf A} -{bf B}||_1$. We show that for every $varepsilonin (0, 1)$, there is a randomized $(1+varepsilon)$-approximation algorithm for $ell_1$-Rank-$r$ Approximation over GF(2) of running time $m^{O(1)}n^{O(2^{4r}cdot varepsilon^{-4})}$. This is the first polynomial time approximation scheme (PTAS) for this problem.

Data Structures and Algorithms

Optimal Estimation of Low Rank Density Matrices

443 - Vladimir Koltchinskii , Dong Xia 2015

The density matrices are positively semi-definite Hermitian matrices of unit trace that describe the state of a quantum system. The goal of the paper is to develop minimax lower bounds on error rates of estimation of low rank density matrices in trace regression models used in quantum state tomography (in particular, in the case of Pauli measurements) with explicit dependence of the bounds on the rank and other complexity parameters. Such bounds are established for several statistically relevant distances, including quant

Machine Learning