ﻻ يوجد ملخص باللغة العربية
We give the first single-pass streaming algorithm for Column Subset Selection with respect to the entrywise $ell_p$-norm with $1 leq p < 2$. We study the $ell_p$ norm loss since it is often considered more robust to noise than the standard Frobenius norm. Given an input matrix $A in mathbb{R}^{d times n}$ ($n gg d$), our algorithm achieves a multiplicative $k^{frac{1}{p} - frac{1}{2}}text{poly}(log nd)$-approximation to the error with respect to the best possible column subset of size $k$. Furthermore, the space complexity of the streaming algorithm is optimal up to a logarithmic factor. Our streaming algorithm also extends naturally to a 1-round distributed protocol with nearly optimal communication cost. A key ingredient in our algorithms is a reduction to column subset selection in the $ell_{p,2}$-norm, which corresponds to the $p$-norm of the vector of Euclidean norms of each of the columns of $A$. This enables us to leverage strong coreset constructions for the Euclidean norm, which previously had not been applied in this context. We also give the first provable guarantees for greedy column subset selection in the $ell_{1, 2}$ norm, which can be used as an alternative, practical subroutine in our algorithms. Finally, we show that our algorithms give significant practical advantages on real-world data analysis tasks.
A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algori
Given a set (or multiset) S of n numbers and a target number t, the subset sum problem is to decide if there is a subset of S that sums up to t. There are several methods for solving this problem, including exhaustive search, divide-and-conquer metho
In the Subset Sum problem we are given a set of $n$ positive integers $X$ and a target $t$ and are asked whether some subset of $X$ sums to $t$. Natural parameters for this problem that have been studied in the literature are $n$ and $t$ as well as t
We study fair clustering problems as proposed by Chierichetti et al. (NIPS 2017). Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias).
We present new lower bounds that show that a polynomial number of passes are necessary for solving some fundamental graph problems in the streaming model of computation. For instance, we show that any streaming algorithm that finds a weighted minimum