No Arabic abstract
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the tradeoff between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself, minimizing the download cost. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on both the MNIST and CIFAR-10 datasets. For the MNIST dataset, the data-driven approach significantly outperforms a non-learning based scheme which combines source coding with multiple file download, while the CIFAR-10 performance is notably better.
Sequences play an important role in many engineering applications and systems. Searching sequences with desired properties has long been an interesting but also challenging research topic. This article proposes a novel method, called HpGAN, to search desired sequences algorithmically using generative adversarial networks (GAN). HpGAN is based on the idea of zero-sum game to train a generative model, which can generate sequences with characteristics similar to the training sequences. In HpGAN, we design the Hopfield network as an encoder to avoid the limitations of GAN in generating discrete data. Compared with traditional sequence construction by algebraic tools, HpGAN is particularly suitable for intractable problems with complex objectives which prevent mathematical analysis. We demonstrate the search capabilities of HpGAN in two applications: 1) HpGAN successfully found many different mutually orthogonal complementary code sets (MOCCS) and optimal odd-length Z-complementary pairs (OB-ZCPs) which are not part of the training set. In the literature, both MOCSSs and OB-ZCPs have found wide applications in wireless communications. 2) HpGAN found new sequences which achieve four-times increase of signal-to-interference ratio--benchmarked against the well-known Legendre sequence--of a mismatched filter (MMF) estimator in pulse compression radar systems. These sequences outperform those found by AlphaSeq.
A new computational private information retrieval (PIR) scheme based on random linear codes is presented. A matrix of messages from a McEliece scheme is used to query the server with carefully chosen errors. The server responds with the sum of the scalar multiple of the rows of the query matrix and the files. The user recovers the desired file by erasure decoding the response. Contrary to code-based cryptographic systems, the scheme presented here enables to use truly random codes, not only codes disguised as such. Further, we show the relation to the so-called error subspace search problem and quotient error search problem, which we assume to be difficult, and show that the scheme is secure against attacks based on solving these problems.
We investigate the use of parametrized families of information-theoretic measures to generalize the loss functions of generative adversarial networks (GANs) with the objective of improving performance. A new generator loss function, called least $k$th-order GAN (L$k$GAN), is first introduced, generalizing the least squares GANs (LSGANs) by using a $k$th order absolute error distortion measure with $k geq 1$ (which recovers the LSGAN loss function when $k=2$). It is shown that minimizing this generalized loss function under an (unconstrained) optimal discriminator is equivalent to minimizing the $k$th-order Pearson-Vajda divergence. Another novel GAN generator loss function is next proposed in terms of R{e}nyi cross-entropy functionals with order $alpha >0$, $alpha eq 1$. It is demonstrated that this R{e}nyi-centric generalized loss function, which provably reduces to the original GAN loss function as $alphato1$, preserves the equilibrium point satisfied by the original GAN based on the Jensen-R{e}nyi divergence, a natural extension of the Jensen-Shannon divergence. Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, under both DCGAN and StyleGAN architectures, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $alpha$, respectively. More specifically, experiments show improvements with regard to the quality of the generated images as measured by the Frechet Inception Distance (FID) score and training stability. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
Most data is automatically collected and only ever seen by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than $1000times$ on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.
Generative adversarial nets (GANs) have been widely studied during the recent development of deep learning and unsupervised learning. With an adversarial training mechanism, GAN manages to train a generative model to fit the underlying unknown real data distribution under the guidance of the discriminative model estimating whether a data instance is real or generated. Such a framework is originally proposed for fitting continuous data distribution such as images, thus it is not straightforward to be directly applied to information retrieval scenarios where the data is mostly discrete, such as IDs, text and graphs. In this tutorial, we focus on discussing the GAN techniques and the variants on discrete data fitting in various information retrieval scenarios. (i) We introduce the fundamentals of GAN framework and its theoretic properties; (ii) we carefully study the promising solutions to extend GAN onto discrete data generation; (iii) we introduce IRGAN, the fundamental GAN framework of fitting single ID data distribution and the direct application on information retrieval; (iv) we further discuss the task of sequential discrete data generation tasks, e.g., text generation, and the corresponding GAN solutions; (v) we present the most recent work on graph/network data fitting with node embedding techniques by GANs. Meanwhile, we also introduce the relevant open-source platforms such as IRGAN and Texygen to help audience conduct research experiments on GANs in information retrieval. Finally, we conclude this tutorial with a comprehensive summarization and a prospect of further research directions for GANs in information retrieval.