No Arabic abstract
We consider the binary classification problem in a setup that preserves the privacy of the original sample. We provide a privacy mechanism that is locally differentially private and then construct a classifier based on the private sample that is universally consistent in Euclidean spaces. Under stronger assumptions, we establish the minimax rates of convergence of the excess risk and see that they are slower than in the case when the original sample is available.
Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce ($epsilon , T$)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Considering the interdependent privacy risks while sharing genomic data, we also analyze the information gain of an attacker about genomes of a donors family members by observing perturbed data of the genome donor and we propose a mechanism to select the privacy budget (i.e., $epsilon$ parameter of LDP) of the donor by also considering privacy preferences of her family members. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).
As massive data are produced from small gadgets, federated learning on mobile devices has become an emerging trend. In the federated setting, Stochastic Gradient Descent (SGD) has been widely used in federated learning for various machine learning models. To prevent privacy leakages from gradients that are calculated on users sensitive data, local differential privacy (LDP) has been considered as a privacy guarantee in federated SGD recently. However, the existing solutions have a dimension dependency problem: the injected noise is substantially proportional to the dimension $d$. In this work, we propose a two-stage framework FedSel for federated SGD under LDP to relieve this problem. Our key idea is that not all dimensions are equally important so that we privately select Top-k dimensions according to their contributions in each iteration of federated SGD. Specifically, we propose three private dimension selection mechanisms and adapt the gradient accumulation technique to stabilize the learning process with noisy updates. We also theoretically analyze privacy, accuracy and time complexity of FedSel, which outperforms the state-of-the-art solutions. Experiments on real-world and synthetic datasets verify the effectiveness and efficiency of our framework.
We prove a general connection between the communication complexity of two-player games and the sample complexity of their multi-player locally private analogues. We use this connection to prove sample complexity lower bounds for locally differentially private protocols as straightforward corollaries of results from communication complexity. In particular, we 1) use a communication lower bound for the hidden layers problem to prove an exponential sample complexity separation between sequentially and fully interactive locally private protocols, and 2) use a communication lower bound for the pointer chasing problem to prove an exponential sample complexity separation between $k$ round and $k+1$ round sequentially interactive locally private protocols, for every $k$.
In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, the raw data $(X_1,Y_1),ldots,(X_n,Y_n)$, taking values in $mathbb{R}^d times mathbb{R}$, cannot be directly observed, and all estimators are functions of the randomised output from a suitable privacy mechanism. The statistician is free to choose the form of the privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of a feature vector $X_i$ and to the value of its response variable $Y_i$. Based on this randomised data, we design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator. The main result is that the estimator is strongly universally consistent. Our methods and analysis also give rise to a strongly universally consistent binary classification rule for locally differentially private data.
Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves $(epsilon,delta)$-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrix-valued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline.