A Framework for Adversarial Streaming via Differential Privacy and Difference Estimators

48 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Moshe Shechner

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Idan Attias - Edith Cohen - Moshe Shechner

بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Streaming algorithms are algorithms for processing large data streams, using only a limited amount of memory. Classical streaming algorithms operate under the assumption that the input stream is fixed in advance. Recently, there is a growing interest in studying streaming algorithms that provide provable guarantees even when the input stream is chosen by an adaptive adversary. Such streaming algorithms are said to be {em adversarially-robust}. We propose a novel framework for adversarial streaming that hybrids two recently suggested frameworks by Hassidim et al. (2020) and by Woodruff and Zhou (2021). These recently suggested frameworks rely on very different ideas, each with its own strengths and weaknesses. We combine these two frameworks (in a non-trivial way) into a single hybrid framework that gains from both approaches to obtain superior performances for turnstile streams.

قيم البحث

113 - Avinatan Hassidim , Haim Kaplan , Yishay Mansour 2020

A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algori thms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters.

بنى وهياكل البيانات والخوارزميات التعلم الآلي

Graph Streaming Lower Bounds for Parameter Estimation and Property Testing via a Streaming XOR Lemma

83 - Sepehr Assadi , Vishvajeet N 2021

We study space-pass tradeoffs in graph streaming algorithms for parameter estimation and property testing problems such as estimating the size of maximum matchings and maximum cuts, weight of minimum spanning trees, or testing if a graph is connected or cycle-free versus being far from these properties. We develop a new lower bound technique that proves that for many problems of interest, including all the above, obtaining a $(1+epsilon)$-approximation requires either $n^{Omega(1)}$ space or $Omega(1/epsilon)$ passes, even on highly restricted families of graphs such as bounded-degree planar graphs. For multiple of these problems, this bound matches those of existing algorithms and is thus (asymptotically) optimal. Our results considerably strengthen prior lower bounds even for arbitrary graphs: starting from the influential work of [Verbin, Yu; SODA 2011], there has been a plethora of lower bounds for single-pass algorithms for these problems; however, the only multi-pass lower bounds proven very recently in [Assadi, Kol, Saxena, Yu; FOCS 2020] rules out sublinear-space algorithms with exponentially smaller $o(log{(1/epsilon)})$ passes for these problems. One key ingredient of our proofs is a simple streaming XOR Lemma, a generic hardness amplification result, that we prove: informally speaking, if a $p$-pass $s$-space streaming algorithm can only solve a decision problem with advantage $delta > 0$ over random guessing, then it cannot solve XOR of $ell$ independent copies of the problem with advantage much better than $delta^{ell}$. This result can be of independent interest and useful for other streaming lower bounds as well.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Differential Privacy on Finite Computers

56 - Victor Balcer , Salil Vadhan 2017

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on {em discrete} models of computation in {em strict} polynomial time, motivated by known attacks on floating point implementations of real-a rithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe $mathcal{X}$. The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in $|mathcal{X}|$, which can be exponential in the bit length of the input. In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its per-bin accuracy (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of $log|mathcal{X}|$, but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram, it is based on an $(n+1)$-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

بنى وهياكل البيانات والخوارزميات

Numerical Composition of Differential Privacy

136 - Sivakanth Gopi , Yin Tat Lee , Lukas Wutschitz 2021

We give a fast algorithm to optimally compose privacy guarantees of differentially private (DP) algorithms to arbitrary accuracy. Our method is based on the notion of privacy loss random variables to quantify the privacy loss of DP algorithms. The ru nning time and memory needed for our algorithm to approximate the privacy curve of a DP algorithm composed with itself $k$ times is $tilde{O}(sqrt{k})$. This improves over the best prior method by Koskela et al. (2020) which requires $tilde{Omega}(k^{1.5})$ running time. We demonstrate the utility of our algorithm by accurately computing the privacy loss of DP-SGD algorithm of Abadi et al. (2016) and showing that our algorithm speeds up the privacy computations by a few orders of magnitude compared to prior work, while maintaining similar accuracy.

بنى وهياكل البيانات والخوارزميات التشفير والأمن التعلم الآلي

A Programming Framework for Differential Privacy with Accuracy Concentration Bounds

134 - Elisabet Lobo-Vesga n Chalmers University of Technology 2019

Differential privacy offers a formal framework for reasoning about privacy and accuracy of computations on private data. It also offers a rich set of building blocks for constructing data analyses. When carefully calibrated, these analyses simultaneo usly guarantee privacy of the individuals contributing their data, and accuracy of their results for inferring useful properties about the population. The compositional nature of differential privacy has motivated the design and implementation of several programming languages aimed at helping a data analyst in programming differentially private analyses. However, most of the programming languages for differential privacy proposed so far provide support for reasoning about privacy but not for reasoning about the accuracy of data analyses. To overcome this limitation, in this work we present DPella, a programming framework providing data analysts with support for reasoning about privacy, accuracy and their trade-offs. The distinguishing feature of DPella is a novel component which statically tracks the accuracy of different data analyses. In order to make tighter accuracy estimations, this component leverages taint analysis for automatically inferring statistical independence of the different noise quantities added for guaranteeing privacy. We show the flexibility of our approach by not only implementing classical counting queries (e.g., CDFs) but also by analyzing hierarchical counting queries (like those done by Census Bureaus), where accuracy have different constraints per level and data analysts should figure out the best manner to calibrate privacy to meet the accuracy requirements.

التشفير والأمن قواعد البيانات لغات البرمجة