Symmetric Boolean Factor Analysis with Applications to InstaHide

282 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sitan Chen

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sitan Chen - Zhao Song - Runzhou Tao

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this work we examine the security of InstaHide, a recently proposed scheme for distributed learning (Huang et al.). A number of recent works have given reconstruction attacks for InstaHide in various regimes by leveraging an intriguing connection to the following matrix factorization problem: given the Gram matrix of a collection of m random k-sparse Boolean vectors in {0,1}^r, recover the vectors (up to the trivial symmetries). Equivalently, this can be thought of as a sparse, symmetric variant of the well-studied problem of Boolean factor analysis, or as an average-case version of the classic problem of recovering a k-uniform hypergraph from its line graph. As previous algorithms either required m to be exponentially large in k or only applied to k = 2, they left open the question of whether InstaHide possesses some form of fine-grained security against reconstruction attacks for moderately large k. In this work, we answer this in the negative by giving a simple O(m^{omega + 1}) time algorithm for the above matrix factorization problem. Our algorithm, based on tensor decomposition, only requires m to be at least quasi-linear in r. We complement this result with a quasipolynomial-time algorithm for a worst-case setting of the problem where the collection of k-sparse vectors is chosen arbitrarily.

قيم البحث

86 - Vitaly Feldman , Audra McMillan , Kunal Talwar 2020

Recent work of Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta [EFMRTT19] demonstrates that random shuffling amplifies differential privacy guarantees of locally randomized data. Such amplification implies substantially stronger priva cy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17] and has lead to significant interest in the shuffle model of privacy [CSUZZ19; EFMRTT19]. We show that random shuffling of $n$ data records that are input to $varepsilon_0$-differentially private local randomizers results in an $(O((1-e^{-varepsilon_0})sqrt{frac{e^{varepsilon_0}log(1/delta)}{n}}), delta)$-differentially private algorithm. This significantly improves over previous work and achieves the asymptotically optimal dependence in $varepsilon_0$. Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees. Importantly, our work also yields an algorithm for deriving tighter bounds on the resulting $varepsilon$ and $delta$ as well as Renyi differential privacy guarantees. We show numerically that our algorithm gets to within a small constant factor of the optimal bound. As a direct corollary of our analysis we derive a simple and nearly optimal algorithm for frequency estimation in the shuffle model of privacy. We also observe that our result implies the first asymptotically optimal privacy analysis of noisy stochastic gradient descent that applies to sampling without replacement.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

How to Use Heuristics for Differential Privacy

166 - Seth Neel , Aaron Roth , Zhiwei Steven Wu 2018

We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guar antees cannot be evaluated empirically, and must be proven --- without making heuristic assumptions. We show that learning problems over broad classes of functions can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider a broad class of functions that includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete halfspaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper).

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

InstaHide: Instance-hiding Schemes for Private Distributed Learning

110 - Yangsibo Huang , Zhao Song , Kai Li 2020

How can multiple distributed entities collaboratively train a shared deep net on their private data while preserving privacy? This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines. The encryption is efficient and applying it during training has minor effect on test accuracy. InstaHide encrypts each training image with a one-time secret key which consists of mixing a number of randomly chosen images and applying a random pixel-wise mask. Other contributions of this paper include: (a) Using a large public dataset (e.g. ImageNet) for mixing during its encryption, which improves security. (b) Experimental results to show effectiveness in preserving privacy against known attacks with only minor effects on accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstrating that use of the pixel-wise mask is important for security, since Mixup alone is shown to be insecure to some some efficient attacks. (e) Release of a challenge dataset https://github.com/Hazelsuko07/InstaHide_Challenge Our code is available at https://github.com/Hazelsuko07/InstaHide

التشفير والأمن التعقيد الحسابي بنى وهياكل البيانات والخوارزميات

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

263 - Ulfar Erlingsson , Vitaly Feldman , Ilya Mironov 2018

Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users private preferences or software usage may be monitored via such reports. We study the collection of such s tatistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a users value. More fundamentally---by building on anonymity of the users reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying $varepsilon$-local differential privacy will satisfy $(O(varepsilon sqrt{log(1/delta)/n}), delta)$-central differential privacy. By this, we explain how the high noise and $sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised $varepsilon$ would indicate---at least if reports are anonymized.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات

Private Stochastic Convex Optimization with Optimal Rates

70 - Raef Bassily , Vitaly Feldman , Kunal Talwar 2019

We study differentially private (DP) algorithms for stochastic convex optimization (SCO). In this problem the goal is to approximately minimize the population loss given i.i.d. samples from a distribution over convex and Lipschitz loss functions. A l ong line of existing work on private convex optimization focuses on the empirical loss and derives asymptotically tight bounds on the excess empirical loss. However a significant gap exists in the known bounds for the population loss. We show that, up to logarithmic factors, the optimal excess population loss for DP algorithms is equal to the larger of the optimal non-private excess population loss, and the optimal excess empirical loss of DP algorithms. This implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/sqrt{n}$ as non-private SCO in the parameter regime most common in practice. The best previous result in this setting gives rate of $1/n^{1/4}$. Our approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization.

التعلم الآلي التشفير والأمن بنى وهياكل البيانات والخوارزميات