No Arabic abstract
The problem of determining the best achievable performance of arbitrary lossless compression algorithms is examined, when correlated side information is available at both the encoder and decoder. For arbitrary source-side information pairs, the conditional information density is shown to provide a sharp asymptotic lower bound for the description lengths achieved by an arbitrary sequence of compressors. This implies that, for ergodic source-side information pairs, the conditional entropy rate is the best achievable asymptotic lower bound to the rate, not just in expectation but with probability one. Under appropriate mixing conditions, a central limit theorem and a law of the iterated logarithm are proved, describing the inevitable fluctuations of the second-order asymptotically best possible rate. An idealised version of Lempel-Ziv coding with side information is shown to be universally first- and second-order asymptotically optimal, under the same conditions. These results are in part based on a new almost-sure invariance principle for the conditional information density, which may be of independent interest.
This paper provides an extensive study of the behavior of the best achievable rate (and other related fundamental limits) in variable-length lossless compression. In the non-asymptotic regime, the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are shown to be tightly coupled. Several precise, quantitative bounds are derived, connecting the distribution of the optimal codelengths to the source information spectrum, and an exact analysis of the best achievable rate for arbitrary sources is given. Fine asymptotic results are proved for arbitrary (not necessarily prefix) compressors on general mixing sources. Non-asymptotic, explicit Gaussian approximation bounds are established for the best achievable rate on Markov sources. The source dispersion and the source varentropy rate are defined and characterized. Together with the entropy rate, the varentropy rate serves to tightly approximate the fundamental non-asymptotic limits of fixed-to-variable compression for all but very small blocklengths.
We consider the problem of Private Information Retrieval with Private Side Information (PIR-PSI), wherein a user wants to retrieve a file from replication based non-colluding databases by using the prior knowledge of a subset of the files stored on the databases. The PIR-PSI framework ensures that the privacy of the demand and the side information are jointly preserved, thereby finding potential applications when multiple files have to be downloaded spread across different time-instants. Although the capacity of the PIR-PSI setting is known, we observe that the underlying capacity achieving code construction uses Maximum Distance Separable (MDS) codes thereby contributing to high computational complexity when retrieving the demand. Pointing at this drawback of MDS-based PIR-PSI codes, we propose XOR-based PIR-PSI codes for a simple yet non-trivial setting of two non-colluding databases and two side information files at the user. While our codes offer substantial reduction in complexity when compared to MDS based codes, the code-rate marginally falls short of the capacity of the PIR-PSI setting. Nevertheless, we show that our code-rate is strictly higher than that of XOR-based codes for PIR with no side information, thereby implying that our codes can be useful when downloading multiple files in a sequential manner, instead of applying XOR-based PIR codes on each file.
Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions. We consider an entire classification of these invariances called the twelvefold way in enumerative combinatorics and develop a method to characterize lossless compression limits. Explicit computations for all twelve settings are carried out for i.i.d. uniform and Bernoulli distributions. Comparisons among settings provide quantitative insight.
This letter investigates a new class of index coding problems. One sender broadcasts packets to multiple users, each desiring a subset, by exploiting prior knowledge of linear combinations of packets. We refer to this class of problems as index coding with coded side-information. Our aim is to characterize the minimum index code length that the sender needs to transmit to simultaneously satisfy all user requests. We show that the optimal binary vector index code length is equal to the minimum rank (minrank) of a matrix whose elements consist of the sets of desired packet indices and side- information encoding matrices. This is the natural extension of matrix minrank in the presence of coded side information. Using the derived expression, we propose a greedy randomized algorithm to minimize the rank of the derived matrix.
Index coding is a source coding problem in which a broadcaster seeks to meet the different demands of several users, each of whom is assumed to have some prior information on the data held by the sender. If the sender knows its clients requests and their side-information sets, then the number of packet transmissions required to satisfy all users demands can be greatly reduced if the data is encoded before sending. The collection of side-information indices as well as the indices of the requested data is described as an instance of the index coding with side-information (ICSI) problem. The encoding function is called the index code of the instance, and the number of transmissions employed by the code is referred to as its length. The main ICSI problem is to determine the optimal length of an index code for and instance. As this number is hard to compute, bounds approximating it are sought, as are algorithms to compute efficient index codes. Two interesting generalizations of the problem that have appeared in the literature are the subject of this work. The first of these is the case of index coding with coded side information, in which linear combinations of the source data are both requested by and held as users side-information. The second is the introduction of error-correction in the problem, in which the broadcast channel is subject to noise. In this paper we characterize the optimal length of a scalar or vector linear index code with coded side information (ICCSI) over a finite field in terms of a generalized min-rank and give bounds on this number based on constructions of random codes for an arbitrary instance. We furthermore consider the length of an optimal error correcting code for an instance of the ICCSI problem and obtain bounds on this number, both for the Hamming metric and for rank-metric errors. We describe decoding algorithms for both categories of errors.