No Arabic abstract
Let $P = {p(i)}$ be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial $P$ for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, $beta$-exponential means, those of the form $log_a sum_i p(i) a^{n(i)}$, where $n(i)$ is the length of the $i$th codeword and $a$ is a positive constant. Applications of such minimizations include a problem of maximizing the chance of message receipt in single-shot communications ($a<1$) and a problem of minimizing the chance of buffer overflow in a queueing system ($a>1$). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions. Both are extended to minimizing maximum pointwise redundancy.
A quasi-Gray code of dimension $n$ and length $ell$ over an alphabet $Sigma$ is a sequence of distinct words $w_1,w_2,dots,w_ell$ from $Sigma^n$ such that any two consecutive words differ in at most $c$ coordinates, for some fixed constant $c>0$. In this paper we are interested in the read and write complexity of quasi-Gray codes in the bit-probe model, where we measure the number of symbols read and written in order to transform any word $w_i$ into its successor $w_{i+1}$. We present construction of quasi-Gray codes of dimension $n$ and length $3^n$ over the ternary alphabet ${0,1,2}$ with worst-case read complexity $O(log n)$ and write complexity $2$. This generalizes to arbitrary odd-size alphabets. For the binary alphabet, we present quasi-Gray codes of dimension $n$ and length at least $2^n - 20n$ with worst-case read complexity $6+log n$ and write complexity $2$. This complements a recent result by Raskin [Raskin 17] who shows that any quasi-Gray code over binary alphabet of length $2^n$ has read complexity $Omega(n)$. Our results significantly improve on previously known constructions and for the odd-size alphabets we break the $Omega(n)$ worst-case barrier for space-optimal (non-redundant) quasi-Gray codes with constant number of writes. We obtain our results via a novel application of algebraic tools together with the principles of catalytic computation [Buhrman et al. 14, Ben-Or and Cleve 92, Barrington 89, Coppersmith and Grossman 75].
Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of lengths (or, equivalently, no codewords have lengths that lie in a complementary set). This paper introduces a polynomial-time dynamic programming algorithm that finds optimal codes for this reserved-length prefix coding problem. This has applications to quickly encoding and decoding lossless codes. In addition, one modification of the approach solves any quasiarithmetic prefix coding problem, while another finds optimal codes restricted to the set of codes with g codeword lengths for user-specified g (e.g., g=2).
Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We present several new combinatorial bounds on LRC codes including the locality-aware sphere packing and Plotkin bounds. We also develop an approach to linear programming (LP) bounds on LRC codes. The resulting LP bound gives better estimates in examples than the other upper bounds known in the literature. Further, we provide the tightest known upper bound on the rate of linear LRC codes with a given relative distance, an improvement over the previous best known bounds.
An $(n, M)$ vector code $mathcal{C} subseteq mathbb{F}^n$ is a collection of $M$ codewords where $n$ elements (from the field $mathbb{F}$) in each of the codewords are referred to as code blocks. Assuming that $mathbb{F} cong mathbb{B}^{ell}$, the code blocks are treated as $ell$-length vectors over the base field $mathbb{B}$. Equivalently, the code is said to have the sub-packetization level $ell$. This paper addresses the problem of constructing MDS vector codes which enable exact reconstruction of each code block by downloading small amount of information from the remaining code blocks. The repair bandwidth of a code measures the information flow from the remaining code blocks during the reconstruction of a single code block. This problem naturally arises in the context of distributed storage systems as the node repair problem [4]. Assuming that $M = |mathbb{B}|^{kell}$, the repair bandwidth of an MDS vector code is lower bounded by $big(frac{n - 1}{n - k}big)cdot ell$ symbols (over the base field $mathbb{B}$) which is also referred to as the cut-set bound [4]. For all values of $n$ and $k$, the MDS vector codes that attain the cut-set bound with the sub-packetization level $ell = (n-k)^{lceil{{n}/{(n-k)}}rceil}$ are known in the literature [23, 35]. This paper presents a construction for MDS vector codes which simultaneously ensures both small repair bandwidth and small sub-packetization level. The obtained codes have the smallest possible sub-packetization level $ell = O(n - k)$ for an MDS vector code and the repair bandwidth which is at most twice the cut-set bound. The paper then generalizes this code construction so that the repair bandwidth of the obtained codes approach the cut-set bound at the cost of increased sub-packetization level. The constructions presented in this paper give MDS vector codes which are linear over the base field $mathbb{B}$.
A Maximum Distance Separable code over an alphabet $F$ is defined via an encoding function $C:F^k rightarrow F^n$ that allows to retrieve a message $m in F^k$ from the codeword $C(m)$ even after erasing any $n-k$ of its symbols. The minimum possible alphabet size of general (non-linear) MDS codes for given parameters $n$ and $k$ is unknown and forms one of the central open problems in coding theory. The paper initiates the study of the alphabet size of codes in a generalized setting where the coding scheme is required to handle a pre-specified subset of all possible erasure patterns, naturally represented by an $n$-vertex $k$-uniform hypergraph. We relate the minimum possible alphabet size of such codes to the strong chromatic number of the hypergraph and analyze the tightness of the obtained bounds for both the linear and non-linear settings. We further consider variations of the problem which allow a small probability of decoding error.