Compression of data streams down to their information content


Abstract in English

According to Kolmogorov complexity, every finite binary string is compressible to a shortest code -- its information content -- from which it is effectively recoverable. We investigate the extent to which this holds for infinite binary sequences (streams). We devise a new coding method which uniformly codes every stream $X$ into an algorithmically random stream $Y$, in such a way that the first $n$ bits of $X$ are recoverable from the first $I(Xupharpoonright_n)$ bits of $Y$, where $I$ is any partial computable information content measure which is defined on all prefixes of $X$, and where $Xupharpoonright_n$ is the initial segment of $X$ of length $n$. As a consequence, if $g$ is any computable upper bound on the initial segment prefix-free complexity of $X$, then $X$ is computable from an algorithmically random $Y$ with oracle-use at most $g$. Alternatively (making no use of such a computable bound $g$) one can achieve an oracle-use bounded above by $K(Xupharpoonright_n)+log n$. This provides a strong analogue of Shannons source coding theorem for algorithmic information theory.

Download