ﻻ يوجد ملخص باللغة العربية
We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the $k$-means objective function. The number of points required in an offline construction is in $tilde{O}(k epsilon^{-2}min(d,kepsilon^{-2}))$ which is minimal among all available constructions. Aside from two constructions with exponential dependence on the dimension, all known coresets are maintained in data streams via the merge and reduce framework, which incurs are large space dependency on $log n$. Instead, our construction crucially relies on Johnson-Lindenstrauss type embeddings which combined with results from online algorithms give us a new technique for efficiently maintaining coresets in data streams without relying on merge and reduce. The final number of points stored by our algorithm in a data stream is in $tilde{O}(k^2 epsilon^{-2} log^2 n min(d,kepsilon^{-2}))$.
We study fair clustering problems as proposed by Chierichetti et al. (NIPS 2017). Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias).
This paper considers $k$-means clustering in the presence of noise. It is known that $k$-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is called $k$
Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of equi-dimensional space via a complex mapping which induces the robust Euler kernel and next employs the popular $k$-means. Consequently, besides enjoying the virtues of k-me
Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear
We show how to approximate a data matrix $mathbf{A}$ with a much smaller sketch $mathbf{tilde A}$ that can be used to solve a general class of constrained k-rank approximation problems to within $(1+epsilon)$ error. Importantly, this class of problem