Learning discrete distributions: user vs item-level privacy

95 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yuhan Liu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yuhan Liu - Ananda Theertha Suresh - Felix Yu

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $mathcal{O}(k/(malpha^2) + k/epsilonalpha)$ to achieve an $ell_1$ distance of $alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/epsilonalpha$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate counts should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $tilde{mathcal{O}}(k/(malpha^2) + k/sqrt{m}epsilonalpha)$ and hence the privacy penalty is $tilde{Theta}(sqrt{m})$ times smaller compared to the standard mechanisms in certain settings of interest. We further show that the proposed mechanism is nearly-optimal under certain regimes. We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.

قيم البحث

اقرأ أيضاً

Learning with User-Level Privacy

151 - Daniel Levy , Ziteng Sun , Kareem Amin 2021

We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a users entire contribution ($m ge 1$ samp les), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis class with finite metric entropy, the privacy cost decreases as $O(1/sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the worst-case optimality of our algorithm for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $tau$ of the distribution rather than the entire range. Under uniform convergence, we derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.

التعلم الآلي التشفير والأمن التحسين والتحكم

Estimating Sparse Discrete Distributions Under Local Privacy and Communication Constraints

52 - Jayadev Acharya , Peter Kairouz , Yuhan Liu 2020

We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor and th e sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a private coin scheme that requires only one bit of communication per user. Under communication constraints, we propose public coin schemes based on random hashing functions. Our tight lower bounds are based on the recently proposed method of chi squared contractions.

نظرية المعلومات التشفير والأمن بنى وهياكل البيانات والخوارزميات

Privacy Amplification by Iteration

208 - Vitaly Feldman , Ilya Mironov , Kunal Talwar 2018

Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then reasoning about the cumulative privacy cost of the algorithm. This is enabled by composition theorems for differential privacy that allow releasing of all the intermediate results. In this work, we demonstrate that for contractive iterations, not releasing the intermediate results strongly amplifies the privacy guarantees. We describe several applications of this new analysis technique to solving convex optimization problems via noisy stochastic gradient descent. For example, we demonstrate that a relatively small number of non-private data points from the same distribution can be used to close the gap between private and non-private convex optimization. In addition, we demonstrate that we can achieve guarantees similar to those obtainable using the privacy-amplification-by-sampling technique in several natural settings where that technique cannot be applied.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

How to Use Heuristics for Differential Privacy

166 - Seth Neel , Aaron Roth , Zhiwei Steven Wu 2018

We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guar antees cannot be evaluated empirically, and must be proven --- without making heuristic assumptions. We show that learning problems over broad classes of functions can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider a broad class of functions that includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete halfspaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper).

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

Projection-Free Bandit Optimization with Privacy Guarantees

186 - Alina Ene , Huy L. Nguyen , Adrian Vladu 2020

We design differentially private algorithms for the bandit convex optimization problem in the projection-free setting. This setting is important whenever the decision set has a complex geometry, and access to it is done efficiently only through a lin ear optimization oracle, hence Euclidean projections are unavailable (e.g. matroid polytope, submodular base polytope). This is the first differentially-private algorithm for projection-free bandit optimization, and in fact our bound of $widetilde{O}(T^{3/4})$ matches the best known non-private projection-free algorithm (Garber-Kretzu, AISTATS `20) and the best known private algorithm, even for the weaker setting when projections are available (Smith-Thakurta, NeurIPS `13).

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات