ترغب بنشر مسار تعليمي؟ اضغط هنا

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

135   0   0.0 ( 0 )
 نشر من قبل Jingchen Hu
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $epsilon$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $repsilon$, if a value of $epsilon$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited systematically to improve the accuracy of the privatized estimate. Specifically, assuming the agency has information for the full population, we ask under which circumstances accuracy gains could be expected, if the privatized estimate would be computed on a random sample instead of the full population. We find that accuracy gains can be achieved for certain regimes. However, gains can typically only be expected, if the sensitivity of the output with respect to small changes in the database does not depend too strongly on the size of the database. We only focus on algorithms that achieve differential privacy by adding noise to the final output and illustrate the accuracy implications for two commonly used statistics: the mean and the median. We see our research as a first step towards understanding the conditions required for accuracy gains in practice and we hope that these findings will stimulate further research broadening the scope of differential privacy algorithms and outputs considered.



قيم البحث

اقرأ أيضاً

Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users private preferences or software usage may be monitored via such reports. We study the collection of such s tatistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a users value. More fundamentally---by building on anonymity of the users reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying $varepsilon$-local differential privacy will satisfy $(O(varepsilon sqrt{log(1/delta)/n}), delta)$-central differential privacy. By this, we explain how the high noise and $sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised $varepsilon$ would indicate---at least if reports are anonymized.
We show that adding differential privacy to Explainable Boosting Machines (EBMs), a recent method for training interpretable ML models, yields state-of-the-art accuracy while protecting privacy. Our experiments on multiple classification and regressi on datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees. In addition to high accuracy, two other benefits of applying DP to EBMs are: a) trained models provide exact global and local interpretability, which is often important in settings where differential privacy is needed; and b) the models can be edited after training without loss of privacy to correct errors which DP noise may have introduced.
Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then reasoning about the cumulative privacy cost of the algorithm. This is enabled by composition theorems for differential privacy that allow releasing of all the intermediate results. In this work, we demonstrate that for contractive iterations, not releasing the intermediate results strongly amplifies the privacy guarantees. We describe several applications of this new analysis technique to solving convex optimization problems via noisy stochastic gradient descent. For example, we demonstrate that a relatively small number of non-private data points from the same distribution can be used to close the gap between private and non-private convex optimization. In addition, we demonstrate that we can achieve guarantees similar to those obtainable using the privacy-amplification-by-sampling technique in several natural settings where that technique cannot be applied.
Traditional differential privacy is independent of the data distribution. However, this is not well-matched with the modern machine learning context, where models are trained on specific data. As a result, achieving meaningful privacy guarantees in M L often excessively reduces accuracy. We propose Bayesian differential privacy (BDP), which takes into account the data distribution to provide more practical privacy guarantees. We also derive a general privacy accounting method under BDP, building upon the well-known moments accountant. Our experiments demonstrate that in-distribution samples in classic machine learning datasets, such as MNIST and CIFAR-10, enjoy significantly stronger privacy guarantees than postulated by DP, while models maintain high classification accuracy.
240 - Weiyan Shi , Aiqi Cui , Evan Li 2021
With the increasing adoption of language models in applications involving sensitive data, it has become crucial to protect these models from leaking private information. Previous work has attempted to tackle this challenge by training RNN-based langu age models with differential privacy guarantees. However, applying classical differential privacy to language models leads to poor model performance as the underlying privacy notion is over-pessimistic and provides undifferentiated protection for all tokens of the data. Given that the private information in natural language is sparse (for example, the bulk of an email might not carry personally identifiable information), we propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data to improve model utility. To realize such a new notion, we develop a corresponding privacy mechanism, Selective-DPSGD, for RNN-based language models. Besides language modeling, we also apply the method to a more concrete application -- dialog systems. Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities while remaining safe under various privacy attacks compared to the baselines. The data, code and models are available at https://github.com/wyshi/lm_privacy.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا