No Arabic abstract
Universal fixed-to-variable lossless source coding for memoryless sources is studied in the finite blocklength and higher-order asymptotics regimes. Optimal third-order coding rates are derived for general fixed-to-variable codes and for prefix codes. It is shown that the non-prefix Type Size code, in which codeword lengths are chosen in ascending order of type class size, achieves the optimal third-order rate and outperforms classical Two-Stage codes. Converse results are proved making use of a result on the distribution of the empirical entropy and Laplaces approximation. Finally, the fixed-to-variable coding problem without a prefix constraint is shown to be essentially the same as the universal guessing problem.
The distributed source coding problem is considered when the sensors, or encoders, are under Byzantine attack; that is, an unknown number of sensors have been reprogrammed by a malicious intruder to undermine the reconstruction at the fusion center. Three different forms of the problem are considered. The first is a variable-rate setup, in which the decoder adaptively chooses the rates at which the sensors transmit. An explicit characterization of the variable-rate minimum achievable sum rate is stated, given by the maximum entropy over the set of distributions indistinguishable from the true source distribution by the decoder. In addition, two forms of the fixed-rate problem are considered, one with deterministic coding and one with randomized coding. The achievable rate regions are given for both these problems, with a larger region achievable using randomized coding, though both are suboptimal compared to variable-rate coding.
The problem of determining the best achievable performance of arbitrary lossless compression algorithms is examined, when correlated side information is available at both the encoder and decoder. For arbitrary source-side information pairs, the conditional information density is shown to provide a sharp asymptotic lower bound for the description lengths achieved by an arbitrary sequence of compressors. This implies that, for ergodic source-side information pairs, the conditional entropy rate is the best achievable asymptotic lower bound to the rate, not just in expectation but with probability one. Under appropriate mixing conditions, a central limit theorem and a law of the iterated logarithm are proved, describing the inevitable fluctuations of the second-order asymptotically best possible rate. An idealised version of Lempel-Ziv coding with side information is shown to be universally first- and second-order asymptotically optimal, under the same conditions. These results are in part based on a new almost-sure invariance principle for the conditional information density, which may be of independent interest.
Integer-Forcing (IF) is a new framework, based on compute-and-forward, for decoding multiple integer linear combinations from the output of a Gaussian multiple-input multiple-output channel. This work applies the IF approach to arrive at a new low-complexity scheme, IF source coding, for distributed lossy compression of correlated Gaussian sources under a minimum mean squared error distortion measure. All encoders use the same nested lattice codebook. Each encoder quantizes its observation using the fine lattice as a quantizer and reduces the result modulo the coarse lattice, which plays the role of binning. Rather than directly recovering the individual quantized signals, the decoder first recovers a full-rank set of judiciously chosen integer linear combinations of the quantized signals, and then inverts it. In general, the linear combinations have smaller average powers than the original signals. This allows to increase the density of the coarse lattice, which in turn translates to smaller compression rates. We also propose and analyze a one-shot version of IF source coding, that is simple enough to potentially lead to a new design principle for analog-to-digital converters that can exploit spatial correlations between the sampled signals.
Integer-forcing source coding has been proposed as a low-complexity method for compression of distributed correlated Gaussian sources. In this scheme, each encoder quantizes its observation using the same fine lattice and reduces the result modulo a coarse lattice. Rather than directly recovering the individual quantized signals, the decoder first recovers a full-rank set of judiciously chosen integer linear combinations of the quantized signals, and then inverts it. It has been observed that the method works very well for most but not all source covariance matrices. The present work quantifies the measure of bad covariance matrices by studying the probability that integer-forcing source coding fails as a function of the allocated rate, %in excess of the %Berger-Tung benchmark, where the probability is with respect to a random orthonormal transformation that is applied to the sources prior to quantization. For the important case where the signals to be compressed correspond to the antenna inputs of relays in an i.i.d. Rayleigh fading environment, this orthonormal transformation can be viewed as being performed by nature. Hence, the results provide performance guarantees for distributed source coding via integer forcing in this scenario.
We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a stationary and ergodic source asymptotically to the Shannon entropy, which, in turn, is the best achievable ratio for lossless data compressors. We consider finite-alphabet and real-valued time series and the following problems: estimation of the limiting probabilities for finite-alphabet time series and estimation of the density for real-valued time series, the on-line prediction, regression, classification (or problems with side information) for both types of the time series and the following problems of hypothesis testing: goodness-of-fit testing, or identity testing, and testing of serial independence. It is important to note that all problems are considered in the framework of classical mathematical statistics and, on the other hand, everyday methods of data compression (or archivers) can be used as a tool for the estimation and testing. It turns out, that quite often the suggested methods and tests are more powerful than known ones when they are applied in practice.