Low-rank binary matrix approximation in column-sum norm

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Fahad Panolan

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fedor V. Fomin - Petr A. Golovach - Fahad Panolan

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider $ell_1$-Rank-$r$ Approximation over GF(2), where for a binary $mtimes n$ matrix ${bf A}$ and a positive integer $r$, one seeks a binary matrix ${bf B}$ of rank at most $r$, minimizing the column-sum norm $||{bf A} -{bf B}||_1$. We show that for every $varepsilonin (0, 1)$, there is a randomized $(1+varepsilon)$-approximation algorithm for $ell_1$-Rank-$r$ Approximation over GF(2) of running time $m^{O(1)}n^{O(2^{4r}cdot varepsilon^{-4})}$. This is the first polynomial time approximation scheme (PTAS) for this problem.

قيم البحث

76 - Fedor V. Fomin , Petr A. Golovach , 2018

We provide a number of algorithmic results for the following family of problems: For a given binary mtimes n matrix A and integer k, decide whether there is a simple binary matrix B which differs from A in at most k entries. For an integer r, the sim plicity of B is characterized as follows. - Binary r-Means: Matrix B has at most r different columns. This problem is known to be NP-complete already for r=2. We show that the problem is solvable in time 2^{O(klog k)}cdot(nm)^{O(1)} and thus is fixed-parameter tractable parameterized by k. We prove that the problem admits a polynomial kernel when parameterized by r and k but it has no polynomial kernel when parameterized by k only unless NPsubseteq coNP/poly. We also complement these result by showing that when being parameterized by r and k, the problem admits an algorithm of running time 2^{O(rcdot sqrt{klog{(k+r)}})}(nm)^{O(1)}, which is subexponential in k for rin O(k^{1/2 -epsilon}) for any epsilon>0. - Low GF(2)-Rank Approximation: Matrix B is of GF(2)-rank at most r. This problem is known to be NP-complete already for r=1. It also known to be W[1]-hard when parameterized by k. Interestingly, when parameterized by r and k, the problem is not only fixed-parameter tractable, but it is solvable in time 2^{O(r^{ 3/2}cdot sqrt{klog{k}})}(nm)^{O(1)}, which is subexponential in k. - Low Boolean-Rank Approximation: Matrix B is of Boolean rank at most r. The problem is known to be NP-complete for k=0 as well as for r=1. We show that it is solvable in subexponential in k time 2^{O(r2^rcdot sqrt{klog k})}(nm)^{O(1)}.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

Approximation Schemes for Low-Rank Binary Matrix Approximation Problems

88 - Fedor V. Fomin , Petr A. Golovach , Daniel Lokshtanov 2018

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain t he first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are textsc{Low GF(2)-Rank Approximation}, textsc{Low Boolean-Rank Approximation}, and vario

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

ReFACTor: Practical Low-Rank Matrix Estimation Under Column-Sparsity

137 - Matan Gavish , Regev Schweiger , Elior Rahmani 2017

Various problems in data analysis and statistical genetics call for recovery of a column-sparse, low-rank matrix from noisy observations. We propose ReFACTor, a simple variation of the classical Truncated Singular Value Decomposition (TSVD) algorithm . In contrast to previous sparse principal component analysis (PCA) algorithms, our algorithm can provably reveal a low-rank signal matrix better, and often significantly better, than the widely used TSVD, making it the algorithm of choice whenever column-sparsity is suspected. Empirically, we observe that ReFACTor consistently outperforms TSVD even when the underlying signal is not sparse, suggesting that it is generally safe to use ReFACTor instead of TSVD and PCA. The algorithm is extremely simple to implement and its running time is dominated by the runtime of PCA, making it as practical as standard principal component analysis.

التعلم الالي

A PTAS for $ell_p$-Low Rank Approximation

104 - Frank Ban , Vijay Bhattiprolu , Karl Bringmann 2018

A number of recent works have studied algorithms for entrywise $ell_p$-low rank approximation, namely, algorithms which given an $n times d$ matrix $A$ (with $n geq d$), output a rank-$k$ matrix $B$ minimizing $|A-B|_p^p=sum_{i,j}|A_{i,j}-B_{i,j}|^p$ when $p > 0$; and $|A-B|_0=sum_{i,j}[A_{i,j} eq B_{i,j}]$ for $p=0$. On the algorithmic side, for $p in (0,2)$, we give the first $(1+epsilon)$-approximation algorithm running in time $n^{text{poly}(k/epsilon)}$. Further, for $p = 0$, we give the first almost-linear time approximation scheme for what we call the Generalized Binary $ell_0$-Rank-$k$ problem. Our algorithm computes $(1+epsilon)$-approximation in time $(1/epsilon)^{2^{O(k)}/epsilon^{2}} cdot nd^{1+o(1)}$. On the hardness of approximation side, for $p in (1,2)$, assuming the Small Set Expansion Hypothesis and the Exponential Time Hypothesis (ETH), we show that there exists $delta := delta(alpha) > 0$ such that the entrywise $ell_p$-Rank-$k$ problem has no $alpha$-approximation algorithm running in time $2^{k^{delta}}$.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

Single Pass Entrywise-Transformed Low Rank Approximation

108 - Yifei Jiang , Yi Li , Yiming Sun 2021

In applications such as natural language processing or computer vision, one is given a large $n times d$ matrix $A = (a_{i,j})$ and would like to compute a matrix decomposition, e.g., a low rank approximation, of a function $f(A) = (f(a_{i,j}))$ appl ied entrywise to $A$. A very important special case is the likelihood function $fleft( A right ) = log{left( left| a_{ij}right| +1right)}$. A natural way to do this would be to simply apply $f$ to each entry of $A$, and then compute the matrix decomposition, but this requires storing all of $A$ as well as multiple passes over its entries. Recent work of Liang et al. shows how to find a rank-$k$ factorization to $f(A)$ for an $n times n$ matrix $A$ using only $n cdot operatorname{poly}(epsilon^{-1}klog n)$ words of memory, with overall error $10|f(A)-[f(A)]_k|_F^2 + operatorname{poly}(epsilon/k) |f(A)|_{1,2}^2$, where $[f(A)]_k$ is the best rank-$k$ approximation to $f(A)$ and $|f(A)|_{1,2}^2$ is the square of the sum of Euclidean lengths of rows of $f(A)$. Their algorithm uses three passes over the entries of $A$. The authors pose the open question of obtaining an algorithm with $n cdot operatorname{poly}(epsilon^{-1}klog n)$ words of memory using only a single pass over the entries of $A$. In this paper we resolve this open question, obtaining the first single-pass algorithm for this problem and for the same class of functions $f$ studied by Liang et al. Moreover, our error is $|f(A)-[f(A)]_k|_F^2 + operatorname{poly}(epsilon/k) |f(A)|_F^2$, where $|f(A)|_F^2$ is the sum of squares of Euclidean lengths of rows of $f(A)$. Thus our error is significantly smaller, as it removes the factor of $10$ and also $|f(A)|_F^2 leq |f(A)|_{1,2}^2$. We also give an algorithm for regression, pointing out an error in previous work, and empirically validate our results.

بنى وهياكل البيانات والخوارزميات التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات