No Arabic abstract
We present a framework for designing differentially private (DP) mechanisms for binary functions via a graph representation of datasets. Datasets are nodes in the graph and any two neighboring datasets are connected by an edge. The true binary function we want to approximate assigns a value (or true color) to a dataset. Randomized DP mechanisms are then equivalent to randomized colorings of the graph. A key notion we use is that of the boundary of the graph. Any two neighboring datasets assigned a different true color belong to the boundary. Under this framework, we show that fixing the mechanism behavior at the boundary induces a unique optimal mechanism. Moreover, if the mechanism is to have a homogeneous behavior at the boundary, we present a closed expression for the optimal mechanism, which is obtained by means of a emph{pullback} operation on the optimal mechanism of a line graph. For balanced mechanisms, not favoring one binary value over another, the optimal $(epsilon,delta)$-DP mechanism takes a particularly simple form, depending only on the minimum distance to the boundary, on $epsilon$, and on $delta$.
Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With $n$ samples, we show that this estimator achieves performance of a straightforward plug-in estimator with $n ln n$ samples, a phenomenon referred to as effective sample size amplification. The minimax optimality of the proposed estimator is proved by comparing it to a matching fundamental lower bound.
Machine learning models are known to memorize the unique properties of individual data points in a training set. This memorization capability can be exploited by several types of attacks to infer information about the training data, most notably, membership inference attacks. In this paper, we propose an approach based on information leakage for guaranteeing membership privacy. Specifically, we propose to use a conditional form of the notion of maximal leakage to quantify the information leaking about individual data entries in a dataset, i.e., the entrywise information leakage. We apply our privacy analysis to the Private Aggregation of Teacher Ensembles (PATE) framework for privacy-preserving classification of sensitive data and prove that the entrywise information leakage of its aggregation mechanism is Schur-concave when the injected noise has a log-concave probability density. The Schur-concavity of this leakage implies that increased consensus among teachers in labeling a query reduces its associated privacy cost. Finally, we derive upper bounds on the entrywise information leakage when the aggregation mechanism uses Laplace distributed noise.
We propose a new approach for defining and searching clusters in graphs that represent real technological or transaction networks. In contrast to the standard way of finding dense parts of a graph, we concentrate on the structure of edges between the clusters, as it is motivated by some earlier observations, e.g. in the structure of networks in ecology and economics and by applications of discrete tomography. Mathematically special colorings and chromatic numbers of graphs are studied.
Motivated by the analogy between successive interference cancellation and iterative belief-propagation on erasure channels, irregular repetition slotted ALOHA (IRSA) strategies have received a lot of attention in the design of medium access control protocols. The IRSA schemes have been mostly analyzed for theoretical scenarios for homogenous sources, where they are shown to substantially improve the system performance compared to classical slotted ALOHA protocols. In this work, we consider generic systems where sources in different importance classes compete for a common channel. We propose a new prioritized IRSA algorithm and derive the probability to correctly resolve collisions for data from each source class. We then make use of our theoretical analysis to formulate a new optimization problem for selecting the transmission strategies of heterogenous sources. We optimize both the replication probability per class and the source rate per class, in such a way that the overall system utility is maximized. We then propose a heuristic-based algorithm for the selection of the transmission strategy, which is built on intrinsic characteristics of the iterative decoding methods adopted for recovering from collisions. Experimental results validate the accuracy of the theoretical study and show the gain of well-chosen prioritized transmission strategies for transmission of data from heterogenous classes over shared wireless channels.
Compressive sensing relies on the sparse prior imposed on the signal of interest to solve the ill-posed recovery problem in an under-determined linear system. The objective function used to enforce the sparse prior information should be both effective and easily optimizable. Motivated by the entropy concept from information theory, in this paper we propose the generalized Shannon entropy function and R{e}nyi entropy function of the signal as the sparsity promoting regularizers. Both entropy functions are nonconvex, non-separable. Their local minimums only occur on the boundaries of the orthants in the Euclidean space. Compared to other popular objective functions, minimizing the generalized entropy functions adaptively promotes multiple high-energy coefficients while suppressing the rest low-energy coefficients. The corresponding optimization problems can be recasted into a series of reweighted $l_1$-norm minimization problems and then solved efficiently by adapting the FISTA. Sparse signal recovery experiments on both the simulated and real data show the proposed entropy functions minimization approaches perform better than other popular approaches and achieve state-of-the-art performances.