Do you want to publish a course? Click here

Low-rank binary matrix approximation in column-sum norm

76   0   0.0 ( 0 )
 Added by Fahad Panolan
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We consider $ell_1$-Rank-$r$ Approximation over GF(2), where for a binary $mtimes n$ matrix ${bf A}$ and a positive integer $r$, one seeks a binary matrix ${bf B}$ of rank at most $r$, minimizing the column-sum norm $||{bf A} -{bf B}||_1$. We show that for every $varepsilonin (0, 1)$, there is a randomized $(1+varepsilon)$-approximation algorithm for $ell_1$-Rank-$r$ Approximation over GF(2) of running time $m^{O(1)}n^{O(2^{4r}cdot varepsilon^{-4})}$. This is the first polynomial time approximation scheme (PTAS) for this problem.



rate research

Read More

We provide a number of algorithmic results for the following family of problems: For a given binary mtimes n matrix A and integer k, decide whether there is a simple binary matrix B which differs from A in at most k entries. For an integer r, the simplicity of B is characterized as follows. - Binary r-Means: Matrix B has at most r different columns. This problem is known to be NP-complete already for r=2. We show that the problem is solvable in time 2^{O(klog k)}cdot(nm)^{O(1)} and thus is fixed-parameter tractable parameterized by k. We prove that the problem admits a polynomial kernel when parameterized by r and k but it has no polynomial kernel when parameterized by k only unless NPsubseteq coNP/poly. We also complement these result by showing that when being parameterized by r and k, the problem admits an algorithm of running time 2^{O(rcdot sqrt{klog{(k+r)}})}(nm)^{O(1)}, which is subexponential in k for rin O(k^{1/2 -epsilon}) for any epsilon>0. - Low GF(2)-Rank Approximation: Matrix B is of GF(2)-rank at most r. This problem is known to be NP-complete already for r=1. It also known to be W[1]-hard when parameterized by k. Interestingly, when parameterized by r and k, the problem is not only fixed-parameter tractable, but it is solvable in time 2^{O(r^{ 3/2}cdot sqrt{klog{k}})}(nm)^{O(1)}, which is subexponential in k. - Low Boolean-Rank Approximation: Matrix B is of Boolean rank at most r. The problem is known to be NP-complete for k=0 as well as for r=1. We show that it is solvable in subexponential in k time 2^{O(r2^rcdot sqrt{klog k})}(nm)^{O(1)}.
We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are textsc{Low GF(2)-Rank Approximation}, textsc{Low Boolean-Rank Approximation}, and vario
Various problems in data analysis and statistical genetics call for recovery of a column-sparse, low-rank matrix from noisy observations. We propose ReFACTor, a simple variation of the classical Truncated Singular Value Decomposition (TSVD) algorithm. In contrast to previous sparse principal component analysis (PCA) algorithms, our algorithm can provably reveal a low-rank signal matrix better, and often significantly better, than the widely used TSVD, making it the algorithm of choice whenever column-sparsity is suspected. Empirically, we observe that ReFACTor consistently outperforms TSVD even when the underlying signal is not sparse, suggesting that it is generally safe to use ReFACTor instead of TSVD and PCA. The algorithm is extremely simple to implement and its running time is dominated by the runtime of PCA, making it as practical as standard principal component analysis.
A number of recent works have studied algorithms for entrywise $ell_p$-low rank approximation, namely, algorithms which given an $n times d$ matrix $A$ (with $n geq d$), output a rank-$k$ matrix $B$ minimizing $|A-B|_p^p=sum_{i,j}|A_{i,j}-B_{i,j}|^p$ when $p > 0$; and $|A-B|_0=sum_{i,j}[A_{i,j} eq B_{i,j}]$ for $p=0$. On the algorithmic side, for $p in (0,2)$, we give the first $(1+epsilon)$-approximation algorithm running in time $n^{text{poly}(k/epsilon)}$. Further, for $p = 0$, we give the first almost-linear time approximation scheme for what we call the Generalized Binary $ell_0$-Rank-$k$ problem. Our algorithm computes $(1+epsilon)$-approximation in time $(1/epsilon)^{2^{O(k)}/epsilon^{2}} cdot nd^{1+o(1)}$. On the hardness of approximation side, for $p in (1,2)$, assuming the Small Set Expansion Hypothesis and the Exponential Time Hypothesis (ETH), we show that there exists $delta := delta(alpha) > 0$ such that the entrywise $ell_p$-Rank-$k$ problem has no $alpha$-approximation algorithm running in time $2^{k^{delta}}$.
108 - Yifei Jiang , Yi Li , Yiming Sun 2021
In applications such as natural language processing or computer vision, one is given a large $n times d$ matrix $A = (a_{i,j})$ and would like to compute a matrix decomposition, e.g., a low rank approximation, of a function $f(A) = (f(a_{i,j}))$ applied entrywise to $A$. A very important special case is the likelihood function $fleft( A right ) = log{left( left| a_{ij}right| +1right)}$. A natural way to do this would be to simply apply $f$ to each entry of $A$, and then compute the matrix decomposition, but this requires storing all of $A$ as well as multiple passes over its entries. Recent work of Liang et al. shows how to find a rank-$k$ factorization to $f(A)$ for an $n times n$ matrix $A$ using only $n cdot operatorname{poly}(epsilon^{-1}klog n)$ words of memory, with overall error $10|f(A)-[f(A)]_k|_F^2 + operatorname{poly}(epsilon/k) |f(A)|_{1,2}^2$, where $[f(A)]_k$ is the best rank-$k$ approximation to $f(A)$ and $|f(A)|_{1,2}^2$ is the square of the sum of Euclidean lengths of rows of $f(A)$. Their algorithm uses three passes over the entries of $A$. The authors pose the open question of obtaining an algorithm with $n cdot operatorname{poly}(epsilon^{-1}klog n)$ words of memory using only a single pass over the entries of $A$. In this paper we resolve this open question, obtaining the first single-pass algorithm for this problem and for the same class of functions $f$ studied by Liang et al. Moreover, our error is $|f(A)-[f(A)]_k|_F^2 + operatorname{poly}(epsilon/k) |f(A)|_F^2$, where $|f(A)|_F^2$ is the sum of squares of Euclidean lengths of rows of $f(A)$. Thus our error is significantly smaller, as it removes the factor of $10$ and also $|f(A)|_F^2 leq |f(A)|_{1,2}^2$. We also give an algorithm for regression, pointing out an error in previous work, and empirically validate our results.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا