ترغب بنشر مسار تعليمي؟ اضغط هنا

Minimax Rates of Estimating Approximate Differential Privacy

451   0   0.0 ( 0 )
 نشر من قبل Xiyang Liu
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With $n$ samples, we show that this estimator achieves performance of a straightforward plug-in estimator with $n ln n$ samples, a phenomenon referred to as effective sample size amplification. The minimax optimality of the proposed estimator is proved by comparing it to a matching fundamental lower bound.



قيم البحث

اقرأ أيضاً

182 - Yihong Wu , Pengkun Yang 2014
Consider the problem of estimating the Shannon entropy of a distribution over $k$ elements from $n$ independent samples. We show that the minimax mean-square error is within universal multiplicative constant factors of $$Big(frac{k }{n log k}Big)^2 + frac{log^2 k}{n}$$ if $n$ exceeds a constant factor of $frac{k}{log k}$; otherwise there exists no consistent estimator. This refines the recent result of Valiant-Valiant cite{VV11} that the minimal sample size for consistent entropy estimation scales according to $Theta(frac{k}{log k})$. The apparatus of best polynomial approximation plays a key role in both the construction of optimal estimators and, via a duality argument, the minimax lower bound.
We present a framework for designing differentially private (DP) mechanisms for binary functions via a graph representation of datasets. Datasets are nodes in the graph and any two neighboring datasets are connected by an edge. The true binary functi on we want to approximate assigns a value (or true color) to a dataset. Randomized DP mechanisms are then equivalent to randomized colorings of the graph. A key notion we use is that of the boundary of the graph. Any two neighboring datasets assigned a different true color belong to the boundary. Under this framework, we show that fixing the mechanism behavior at the boundary induces a unique optimal mechanism. Moreover, if the mechanism is to have a homogeneous behavior at the boundary, we present a closed expression for the optimal mechanism, which is obtained by means of a emph{pullback} operation on the optimal mechanism of a line graph. For balanced mechanisms, not favoring one binary value over another, the optimal $(epsilon,delta)$-DP mechanism takes a particularly simple form, depending only on the minimum distance to the boundary, on $epsilon$, and on $delta$.
For a stationary stochastic process ${X_n}$ with values in some set $A$, a finite word $w in A^K$ is called a memory word if the conditional probability of $X_0$ given the past is constant on the cylinder set defined by $X_{-K}^{-1}=w$. It is a calle d a minimal memory word if no proper suffix of $w$ is also a memory word. For example in a $K$-step Markov processes all words of length $K$ are memory words but not necessarily minimal. We consider the problem of determining the lengths of the longest minimal memory words and the shortest memory words of an unknown process ${X_n}$ based on sequentially observing the outputs of a single sample ${xi_1,xi_2,...xi_n}$. We will give a universal estimator which converges almost surely to the length of the longest minimal memory word and show that no such universal estimator exists for the length of the shortest memory word. The alphabet $A$ may be finite or countable.
This paper provides fundamental limits on the sample complexity of estimating dictionaries for tensor data. The specific focus of this work is on $K$th-order tensor data and the case where the underlying dictionary can be expressed in terms of $K$ sm aller dictionaries. It is assumed the data are generated by linear combinations of these structured dictionary atoms and observed through white Gaussian noise. This work first provides a general lower bound on the minimax risk of dictionary learning for such tensor data and then adapts the proof techniques for specialized results in the case of sparse and sparse-Gaussian linear combinations. The results suggest the sample complexity of dictionary learning for tensor data can be significantly lower than that for unstructured data: for unstructured data it scales linearly with the product of the dictionary dimensions, whereas for tensor-structured data the bound scales linearly with the sum of the product of the dimensions of the (smaller) component dictionaries. A partial converse is provided for the case of 2nd-order tensor data to show that the bounds in this paper can be tight. This involves developing an algorithm for learning highly-structured dictionaries from noisy tensor data. Finally, numerical experiments highlight the advantages associated with explicitly accounting for tensor data structure during dictionary learning.
We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor and th e sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a private coin scheme that requires only one bit of communication per user. Under communication constraints, we propose public coin schemes based on random hashing functions. Our tight lower bounds are based on the recently proposed method of chi squared contractions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا