No Arabic abstract
Given two random variables $X$ and $Y$, an operational approach is undertaken to quantify the ``leakage of information from $X$ to $Y$. The resulting measure $mathcal{L}(X !! to !! Y)$ is called emph{maximal leakage}, and is defined as the multiplicative increase, upon observing $Y$, of the probability of correctly guessing a randomized function of $X$, maximized over all such randomized functions. A closed-form expression for $mathcal{L}(X !! to !! Y)$ is given for discrete $X$ and $Y$, and it is subsequently generalized to handle a large class of random variables. The resulting properties are shown to be consistent with an axiomatic view of a leakage measure, and the definition is shown to be robust to variations in the setup. Moreover, a variant of the Shannon cipher system is studied, in which performance of an encryption scheme is measured using maximal leakage. A single-letter characterization of the optimal limit of (normalized) maximal leakage is derived and asymptotically-optimal encryption schemes are demonstrated. Furthermore, the sample complexity of estimating maximal leakage from data is characterized up to subpolynomial factors. Finally, the emph{guessing} framework used to define maximal leakage is used to give operational interpretations of commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy.
Multivariate information decompositions hold promise to yield insight into complex systems, and stand out for their ability to identify synergistic phenomena. However, the adoption of these approaches has been hindered by there being multiple possible decompositions, and no precise guidance for preferring one over the others. At the heart of this disagreement lies the absence of a clear operational interpretation of what synergistic information is. Here we fill this gap by proposing a new information decomposition based on a novel operationalisation of informational synergy, which leverages recent developments in the literature of data privacy. Our decomposition is defined for any number of information sources, and its atoms can be calculated using elementary optimisation techniques. The decomposition provides a natural coarse-graining that scales gracefully with the systems size, and is applicable in a wide range of scenarios of practical interest.
A communication setup is considered where a transmitter wishes to convey a message to a receiver and simultaneously estimate the state of that receiver through a common waveform. The state is estimated at the transmitter by means of generalized feedback, i.e., a strictly causal channel output, and the known waveform. The scenario at hand is motivated by joint radar and communication, which aims to co-design radar sensing and communication over shared spectrum and hardware. For the case of memoryless single receiver channels with i.i.d. time-varying state sequences, we fully characterize the capacity-distortion tradeoff, defined as the largest achievable rate below which a message can be conveyed reliably while satisfying some distortion constraints on state sensing. We propose a numerical method to compute the optimal input that achieves the capacity-distortion tradeoff. Then, we address memoryless state-dependent broadcast channels (BCs). For physically degraded BCs with i.i.d. time-varying state sequences, we characterize the capacity-distortion tradeoff region as a rather straightforward extension of single receiver channels. For general BCs, we provide inner and outer bounds on the capacity-distortion region, as well as a sufficient condition when this capacity-distortion region is equal to the product of the capacity region and the set of achievable distortions. A number of illustrative examples demonstrates that the optimal co-design schemes outperform conventional schemes that split the resources between sensing and communication.
Machine learning models are known to memorize the unique properties of individual data points in a training set. This memorization capability can be exploited by several types of attacks to infer information about the training data, most notably, membership inference attacks. In this paper, we propose an approach based on information leakage for guaranteeing membership privacy. Specifically, we propose to use a conditional form of the notion of maximal leakage to quantify the information leaking about individual data entries in a dataset, i.e., the entrywise information leakage. We apply our privacy analysis to the Private Aggregation of Teacher Ensembles (PATE) framework for privacy-preserving classification of sensitive data and prove that the entrywise information leakage of its aggregation mechanism is Schur-concave when the injected noise has a log-concave probability density. The Schur-concavity of this leakage implies that increased consensus among teachers in labeling a query reduces its associated privacy cost. Finally, we derive upper bounds on the entrywise information leakage when the aggregation mechanism uses Laplace distributed noise.
This work constructs a discrete random variable that, when conditioned upon, ensures information stability of quasi-images. Using this construction, a new methodology is derived to obtain information theoretic necessary conditions directly from operational requirements. In particular, this methodology is used to derive new necessary conditions for keyed authentication over discrete memoryless channels and to establish the capacity region of the wiretap channel, subject to finite leakage and finite error, under two different secrecy metrics. These examples establish the usefulness of the proposed methodology.
A new method to measure nonlinear dependence between two variables is described using mutual information to analyze the separate linear and nonlinear components of dependence. This technique, which gives an exact value for the proportion of linear dependence, is then compared with another common test for linearity, the Brock, Dechert and Scheinkman (BDS) test.