ترغب بنشر مسار تعليمي؟ اضغط هنا

Consider the set of source distributions within a fixed maximum relative entropy with respect to a given nominal distribution. Lossless source coding over this relative entropy ball can be approached in more than one way. A problem previously conside red is finding a minimax average length source code. The minimizing players are the codeword lengths --- real numbers for arithmetic codes, integers for prefix codes --- while the maximizing players are the uncertain source distributions. Another traditional minimizing objective is the first one considered here, maximum (average) redundancy. This problem reduces to an extension of an exponential Huffman objective treated in the literature but heretofore without direct practical application. In addition to these, this paper examines the related problem of maximal minimax pointwise redundancy and the problem considered by Gawrychowski and Gagie, which, for a sufficiently small relative entropy ball, is equivalent to minimax redundancy. One can consider both Shannon-like coding based on optimal real number (ideal) codeword lengths and a Huffman-like optimal prefix coding.
104 - Michael B. Baer 2011
We present new lower and upper bounds for the compression rate of binary prefix codes optimized over memoryless sources according to two related exponential codeword length objectives. The objectives explored here are exponential-average length and e xponential-average redundancy. The first of these relates to various problems involving queueing, uncertainty, and lossless communications, and it can be reduced to the second, which has properties more amenable to analysis. These bounds, some of which are tight, are in terms of a form of entropy and/or the probability of an input symbol, improving on recently discovered bounds of similar form. We also observe properties of optimal codes over the exponential-average redundancy utility.
43 - Michael B. Baer 2009
A method is presented for constructing a Tunstall code that is linear time in the number of output items. This is an improvement on the state of the art for non-Bernoulli sources, including Markov sources, which require a (suboptimal) generalization of Tunstalls algorithm proposed by Savari and analytically examined by Tabus and Rissanen. In general, if n is the total number of output leaves across all Tunstall trees, s is the number of trees (states), and D is the number of leaves of each internal node, then this method takes O((1+(log s)/D) n) time and O(n) space.
71 - Michael B. Baer 2007
Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of lengths (or, e quivalently, no codewords have lengths that lie in a complementary set). This paper introduces a polynomial-time dynamic programming algorithm that finds optimal codes for this reserved-length prefix coding problem. This has applications to quickly encoding and decoding lossless codes. In addition, one modification of the approach solves any quasiarithmetic prefix coding problem, while another finds optimal codes restricted to the set of codes with g codeword lengths for user-specified g (e.g., g=2).
155 - Michael B. Baer 2007
This paper presents new lower and upper bounds for the compression rate of binary prefix codes optimized over memoryless sources according to various nonlinear codeword length objectives. Like the most well-known redundancy bounds for minimum average redundancy coding - Huffman coding - these are in terms of a form of entropy and/or the probability of an input symbol, often the most probable one. The bounds here, some of which are tight, improve on known bounds of the form L in [H,H+1), where H is some form of entropy in bits (or, in the case of redundancy objectives, 0) and L is the length objective, also in bits. The objectives explored here include exponential-average length, maximum pointwise redundancy, and exponential-average pointwise redundancy (also called dth exponential redundancy). The first of these relates to various problems involving queueing, uncertainty, and lossless communications; the second relates to problems involving Shannon coding and universal modeling. For these two objectives we also explore the related problem of the necessary and sufficient conditions for the shortest codeword of a code being a specific length.
93 - Michael B. Baer 2007
Efficient optimal prefix coding has long been accomplished via the Huffman algorithm. However, there is still room for improvement and exploration regarding variants of the Huffman problem. Length-limited Huffman coding, useful for many practical app lications, is one such variant, in which codes are restricted to the set of codes in which none of the $n$ codewords is longer than a given length, $l_{max}$. Binary length-limited coding can be done in $O(n l_{max})$ time and O(n) space via the widely used Package-Merge algorithm. In this paper the Package-Merge approach is generalized without increasing complexity in order to introduce a minimum codeword length, $l_{min}$, to allow for objective functions other than the minimization of expected codeword length, and to be applicable to both binary and nonbinary codes; nonbinary codes were previously addressed using a slower dynamic programming approach. These extensions have various applications -- including faster decompression -- and can be used to solve the problem of finding an optimal code with limited fringe, that is, finding the best code among codes with a maximum difference between the longest and shortest codewords. The previously proposed method for solving this problem was nonpolynomial time, whereas solving this using the novel algorithm requires only $O(n (l_{max}- l_{min})^2)$ time and O(n) space.
102 - Michael B. Baer 2007
Let $P = {p(i)}$ be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial $P$ for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, $beta$-exponential means, those of the form $log_a sum_i p(i) a^{n(i)}$, where $n(i)$ is the length of the $i$th codeword and $a$ is a positive constant. Applications of such minimizations include a problem of maximizing the chance of message receipt in single-shot communications ($a<1$) and a problem of minimizing the chance of buffer overflow in a queueing system ($a>1$). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions. Both are extended to minimizing maximum pointwise redundancy.
82 - Michael B. Baer 2006
In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many random variabl es encountered in practice. For such random variables, compression performance is judged via estimates of expected bits per input symbol. This correspondence introduces a family of prefix codes with an eye towards near-optimal coding of known distributions. Compression performance is precisely estimated for well-known probability distributions using these codes and using previously known prefix codes. One application of these near-optimal codes is an improved representation of rational numbers.
59 - Michael B. Baer 2006
The decision tree is one of the most fundamental programming abstractions. A commonly used type of decision tree is the alphabetic binary tree, which uses (without loss of generality) ``less than versus greater than or equal to tests in order to dete rmine one of $n$ outcome events. The process of finding an optimal alphabetic binary tree for a known probability distribution on outcome events usually has the underlying assumption that the cost (time) per decision is uniform and thus independent of the outcome of the decision. This assumption, however, is incorrect in the case of software to be optimized for a given microprocessor, e.g., in compiling switch statements or in fine-tuning program bottlenecks. The operation of the microprocessor generally means that the cost for the more likely decision outcome can or will be less -- often far less -- than the less likely decision outcome. Here we formulate a variety of $O(n^3)$-time $O(n^2)$-space dynamic programming algorithms to solve such optimal binary decision tree problems, optimizing for the behavior of processors with predictive branch capabilities, both static and dynamic. In the static case, we use existing results to arrive at entropy-based performance bounds. Solutions to this formulation are often faster in practice than ``optimal decision trees as formulated in the literature, and, for small problems, are easily worth the extra complexity in finding the better solution. This can be applied in fast implementation of decoding Huffman codes.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا