No Arabic abstract
In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many random variables encountered in practice. For such random variables, compression performance is judged via estimates of expected bits per input symbol. This correspondence introduces a family of prefix codes with an eye towards near-optimal coding of known distributions. Compression performance is precisely estimated for well-known probability distributions using these codes and using previously known prefix codes. One application of these near-optimal codes is an improved representation of rational numbers.
The Kraft inequality gives a necessary and sufficient condition for the existence of a single channel prefix-free code. However, the multichannel Kraft inequality does not imply the existence of a multichannel prefix-free code in general. It is natural to ask whatever there exists an efficient decision procedure for the existence of multichannel prefix-free codes. In this paper, we tackle the two-channel case of the above problem by relating it to a constrained rectangle packing problem. Although a general rectangle packing problem is NP-complete, the extra imposed constraints allow us to propose an algorithm which can solve the problem efficiently.
Let $P = {p(i)}$ be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial $P$ for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, $beta$-exponential means, those of the form $log_a sum_i p(i) a^{n(i)}$, where $n(i)$ is the length of the $i$th codeword and $a$ is a positive constant. Applications of such minimizations include a problem of maximizing the chance of message receipt in single-shot communications ($a<1$) and a problem of minimizing the chance of buffer overflow in a queueing system ($a>1$). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions. Both are extended to minimizing maximum pointwise redundancy.
We consider the problem of estimating Shannons entropy $H$ from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over $H$ can be computed analytically. We derive formulas for the posterior mean (Bayes least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over $H$, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over $H$. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.
Motivated by recently derived fundamental limits on total (transmit + decoding) power for coded communication with VLSI decoders, this paper investigates the scaling behavior of the minimum total power needed to communicate over AWGN channels as the target bit-error-probability tends to zero. We focus on regular-LDPC codes and iterative message-passing decoders. We analyze scaling behavior under two VLSI complexity models of decoding. One model abstracts power consumed in processing elements (node model), and another abstracts power consumed in wires which connect the processing elements (wire model). We prove that a coding strategy using regular-LDPC codes with Gallager-B decoding achieves order-optimal scaling of total power under the node model. However, we also prove that regular-LDPC codes and iterative message-passing decoders cannot meet existing fundamental limits on total power under the wire model. Further, if the transmit energy-per-bit is bounded, total power grows at a rate that is worse than uncoded transmission. Complementing our theoretical results, we develop detailed physical models of decoding implementations using post-layout circuit simulations. Our theoretical and numerical results show that approaching fundamental limits on total power requires increasing the complexity of both the code design and the corresponding decoding algorithm as communication distance is increased or error-probability is lowered.
We prove that, for the binary erasure channel (BEC), the polar-coding paradigm gives rise to codes that not only approach the Shannon limit but do so under the best possible scaling of their block length as a~function of the gap to capacity. This result exhibits the first known family of binary codes that attain both optimal scaling and quasi-linear complexity of encoding and decoding. Our proof is based on the construction and analysis of binary polar codes with large kernels. When communicating reliably at rates within $varepsilon > 0$ of capacity, the code length $n$ often scales as $O(1/varepsilon^{mu})$, where the constant $mu$ is called the scaling exponent. It is known that the optimal scaling exponent is $mu=2$, and it is achieved by random linear codes. The scaling exponent of conventional polar codes (based on the $2times 2$ kernel) on the BEC is $mu=3.63$. This falls far short of the optimal scaling guaranteed by random codes. Our main contribution is a rigorous proof of the following result: for the BEC, there exist $elltimesell$ binary kernels, such that polar codes constructed from these kernels achieve scaling exponent $mu(ell)$ that tends to the optimal value of $2$ as $ell$ grows. We furthermore characterize precisely how large $ell$ needs to be as a function of the gap between $mu(ell)$ and $2$. The resulting binary codes maintain the recursive structure of conventional polar codes, and thereby achieve construction complexity $O(n)$ and encoding/decoding complexity $O(nlog n)$.