ﻻ يوجد ملخص باللغة العربية
Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $epsilon$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $repsilon$, if a value of $epsilon$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited systematically to improve the accuracy of the privatized estimate. Specifically, assuming the agency has information for the full population, we ask under which circumstances accuracy gains could be expected, if the privatized estimate would be computed on a random sample instead of the full population. We find that accuracy gains can be achieved for certain regimes. However, gains can typically only be expected, if the sensitivity of the output with respect to small changes in the database does not depend too strongly on the size of the database. We only focus on algorithms that achieve differential privacy by adding noise to the final output and illustrate the accuracy implications for two commonly used statistics: the mean and the median. We see our research as a first step towards understanding the conditions required for accuracy gains in practice and we hope that these findings will stimulate further research broadening the scope of differential privacy algorithms and outputs considered.
Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users private preferences or software usage may be monitored via such reports. We study the collection of such s
We show that adding differential privacy to Explainable Boosting Machines (EBMs), a recent method for training interpretable ML models, yields state-of-the-art accuracy while protecting privacy. Our experiments on multiple classification and regressi
Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then
Traditional differential privacy is independent of the data distribution. However, this is not well-matched with the modern machine learning context, where models are trained on specific data. As a result, achieving meaningful privacy guarantees in M
With the increasing adoption of language models in applications involving sensitive data, it has become crucial to protect these models from leaking private information. Previous work has attempted to tackle this challenge by training RNN-based langu