ﻻ يوجد ملخص باللغة العربية
We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $mathcal{Q}$, we aim to output, under the constraints of $varepsilon$-local differential privacy, a distribution from $mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. This is a generalization of the classic problem of $k$-wise simple hypothesis testing, which corresponds to when $p in mathcal{Q}$, and we wish to identify $p$. Absent privacy constraints, this problem requires $O(log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy. However, the naive approach to this problem under local differential privacy would require $tilde O(k^2)$ samples. We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least $Omega(k)$ samples. Second, for the special case of $k$-wise simple hypothesis testing, we provide a non-interactive algorithm which nearly matches this bound, requiring $tilde O(k)$ samples. Finally, we provide sequentially interactive algorithms for the general case, requiring $tilde O(k)$ samples and only $O(log log k)$ rounds of interactivity. Our algorithms are achieved through a reduction to maximum selection with adversarial comparators, a problem of independent interest for which we initiate study in the parallel setting. For this problem, we provide a family of algorithms for each number of allowed rounds of interaction $t$, as well as lower bounds showing that they are near-optimal for every $t$. Notably, our algorithms result in exponential improvements on the round complexity of previous methods.
Given a data set of size $n$ in $d$-dimensional Euclidean space, the $k$-means problem asks for a set of $k$ points (called centers) so that the sum of the $ell_2^2$-distances between points of a given data set of size $n$ and the set of $k$ centers
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine lea
A centrally differentially private algorithm maps raw data to differentially private outputs. In contrast, a locally differentially private algorithm may only access data through public interaction with data holders, and this interaction must be a di
Being able to efficiently and accurately select the top-$k$ elements with differential privacy is an integral component of various private data analysis tasks. In this paper, we present the oneshot Laplace mechanism, which generalizes the well-known
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices, initiating an extensive line of research in accurately and privately releasing statistics of