No Arabic abstract
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., ACGT to ACGACGT, where k = 3, while a noisy duplication inserts a copy suffering from substitution noise, e.g., ACGT to ACGATGT. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.
Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and substitution errors. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one substitution error. Because a substituted symbol can be duplicated many times (as part of substrings of various lengths), a single substitution can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional substitution is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.
We consider network coding for networks experiencing worst-case bit-flip errors, and argue that this is a reasonable model for highly dynamic wireless network transmissions. We demonstrate that in this setup prior network error-correcting schemes can be arbitrarily far from achieving the optimal network throughput. We propose a new metric for errors under this model. Using this metric, we prove a new Hamming-type upper bound on the network capacity. We also show a commensurate lower bound based on GV-type codes that can be used for error-correction. The codes used to attain the lower bound are non-coherent (do not require prior knowledge of network topology). The end-to-end nature of our design enables our codes to be overlaid on classical distributed random linear network codes. Further, we free internal nodes from having to implement potentially computationally intensive link-by-link error-correction.
Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitutions is unrestricted. Both error-detecting and error-correcting codes are constructed, which can handle correctly any number of tandem duplications of a fixed length $k$, and at most a single substitution occurring at any time during the mutation process.
The subject of this paper is transmission over a general class of binary-input memoryless symmetric channels using error correcting codes based on sparse graphs, namely low-density generator-matrix and low-density parity-check codes. The optimal (or ideal) decoder based on the posterior measure over the code bits, and its relationship to the sub-optimal belief propagation decoder, are investigated. We consider the correlation (or covariance) between two codebits, averaged over the noise realizations, as a function of the graph distance, for the optimal decoder. Our main result is that this correlation decays exponentially fast for fixed general low-density generator-matrix codes and high enough noise parameter, and also for fixed general low-density parity-check codes and low enough noise parameter. This has many consequences. Appropriate performance curves - called GEXIT functions - of the belief propagation and optimal decoders match in high/low noise regimes. This means that in high/low noise regimes the performance curves of the optimal decoder can be computed by density evolution. Another interpretation is that the replica predictions of spin-glass theory are exact. Our methods are rather general and use cluster expansions first developed in the context of mathematical statistical mechanics.
The concept of asymmetric entanglement-assisted quantum error-correcting code (asymmetric EAQECC) is introduced in this article. Codes of this type take advantage of the asymmetry in quantum errors since phase-shift errors are more probable than qudit-flip errors. Moreover, they use pre-shared entanglement between encoder and decoder to simplify the theory of quantum error correction and increase the communication capacity. Thus, asymmetric EAQECCs can be constructed from any pair of classical linear codes over an arbitrary field. Their parameters are described and a Gilbert-Varshamov bound is presented. Explicit parameters of asymmetric EAQECCs from BCH codes are computed and examples exceeding the introduced Gilbert-Varshamov bound are shown.