ﻻ يوجد ملخص باللغة العربية
Given a data set of size $n$ in $d$-dimensional Euclidean space, the $k$-means problem asks for a set of $k$ points (called centers) so that the sum of the $ell_2^2$-distances between points of a given data set of size $n$ and the set of $k$ centers is minimized. Recent work on this problem in the locally private setting achieves constant multiplicative approximation with additive error $tilde{O} (n^{1/2 + a} cdot k cdot max {sqrt{d}, sqrt{k} })$ and proves a lower bound of $Omega(sqrt{n})$ on the additive error for any solution with a constant number of rounds. In this work we bridge the gap between the exponents of $n$ in the upper and lower bounds on the additive error with two new algorithms. Given any $alpha>0$, our first algorithm achieves a multiplicative approximation guarantee which is at most a $(1+alpha)$ factor greater than that of any non-private $k$-means clustering algorithm with $k^{tilde{O}(1/alpha^2)} sqrt{d n} mbox{poly}log n$ additive error. Given any $c>sqrt{2}$, our second algorithm achieves $O(k^{1 + tilde{O}(1/(2c^2-1))} sqrt{d n} mbox{poly} log n)$ additive error with constant multiplicative approximation. Both algorithms go beyond the $Omega(n^{1/2 + a})$ factor that occurs in the additive error for arbitrarily small parameters $a$ in previous work, and the second algorithm in particular shows for the first time that it is possible to solve the locally private $k$-means problem in a constant number of rounds with constant factor multiplicative approximation and polynomial dependence on $k$ in the additive error arbitrarily close to linear.
We introduce a new $(epsilon_p, delta_p)$-differentially private algorithm for the $k$-means clustering problem. Given a dataset in Euclidean space, the $k$-means clustering problem requires one to find $k$ points in that space such that the sum of s
In this note, we describe a simple approach to obtain a differentially private algorithm for k-clustering with nearly the same multiplicative factor as any non-private counterpart at the cost of a large polynomial additive error. The approach is the
We show how to approximate a data matrix $mathbf{A}$ with a much smaller sketch $mathbf{tilde A}$ that can be used to solve a general class of constrained k-rank approximation problems to within $(1+epsilon)$ error. Importantly, this class of problem
We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $mathcal{Q}$, we aim to output, under the constraints of $varepsilon
Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on s