Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Single-Error Detection and Correction for Duplication and Substitution Channels

111 0 0.0 ( 0 )

Download Cite

Added by Yonatan Yehezkeally

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yuanyuan Tang - Yonatan Yehezkeally - Moshe Schwartz

Information Theory Information Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitutions is unrestricted. Both error-detecting and error-correcting codes are constructed, which can handle correctly any number of tandem duplications of a fixed length $k$, and at most a single substitution occurring at any time during the mutation process.

rate research

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

143 - Yuanyuan Tang , Farzad Farnoud 2020

Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and substitution errors. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one substitution error. Because a substituted symbol can be duplicated many times (as part of substrings of various lengths), a single substitution can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional substitution is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.

Information Theory Information Theory

Error-correcting Codes for Noisy Duplication Channels

192 - Yuanyuan Tang , Farzad Farnoud 2020

Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., ACGT to ACGACGT, where k = 3, while a noisy duplication inserts a copy suffering from substitution noise, e.g., ACGT to ACGATGT. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.

Information Theory Information Theory

Linear hash-functions and their applications to error detection and correction

155 - Boris Ryabko 2020

We describe and explore so-called linear hash functions and show how they can be used to build error detection and correction codes. The method can be applied for different types of errors (for example, burst errors). When the method is applied to a model where number of distorted letters is limited, the obtained estimate of its performance is slightly better than the known Varshamov-Gilbert bound. We also describe random code whose performance is close to the same boundary, but its construction is much simpler. In some cases the obtained methods are simpler and more flexible than the known ones. In particular, the complexity of the obtained error detection code and the well-known CRC code is close, but the proposed code, unlike CRC, can detect with certainty errors whose number does not exceed a predetermined limit.

Information Theory Information Theory

Joint Crosstalk-Avoidance and Error-Correction Coding for Parallel Data Buses

101 - Urs Niesen , Shrinivas Kudekar 2016

Decreasing transistor sizes and lower voltage swings cause two distinct problems for communication in integrated circuits. First, decreasing inter-wire spacing increases interline capacitive coupling, which adversely affects transmission energy and delay. Second, lower voltage swings render the transmission susceptible to various noise sources. Coding can be used to address both these problems. So-called crosstalk-avoidance codes mitigate capacitive coupling, and traditional error-correction codes introduce resilience against channel errors. Unfortunately, crosstalk-avoidance and error-correction codes cannot be combined in a straightforward manner. On the one hand, crosstalk-avoidance encoding followed by error-correction encoding destroys the crosstalk-avoidance property. On the other hand, error-correction encoding followed by crosstalk-avoidance encoding causes the crosstalk-avoidance decoder to fail in the presence of errors. Existing approaches circumvent this difficulty by using additional bus wires to protect the parities generated from the output of the error-correction encoder, and are therefore inefficient. In this work we propose a novel joint crosstalk-avoidance and error-correction coding and decoding scheme that provides higher bus transmission rates compared to existing approaches. Our joint approach carefully embeds the parities such that the crosstalk-avoidance property is preserved. We analyze the rate and minimum distance of the proposed scheme. We also provide a density evolution analysis and predict iterative decoding thresholds for reliable communication under random bus erasures. This density evolution analysis is nonstandard, since the crosstalk-avoidance constraints are inherently nonlinear.

Information Theory Information Theory

Algorithms for reconstruction over single and multiple deletion channels

72 - Sundara Rajan Srinivasavaradhan , Michelle Du , Suhas Diggavi andn Christina Fragouli 2020

Recent advances in DNA sequencing technology and DNA storage systems have rekindled the interest in deletion channels. Multiple recent works have looked at variants of sequence reconstruction over a single and over multiple deletion channels, a notoriously difficult problem due to its highly combinatorial nature. Although works in theoretical computer science have provided algorithms which guarantee perfect reconstruction with multiple independent observations from the deletion channel, they are only applicable in the large blocklength regime and more restrictively, when the number of observations is also large. Indeed, with only a few observations, perfect reconstruction of the input sequence may not even be possible in most cases. In such situations, maximum likelihood (ML) and maximum aposteriori (MAP) estimates for the deletion channels are natural questions that arise and these have remained open to the best of our knowledge. In this work, we take steps to answer the two aforementioned questions. Specifically: 1. We show that solving for the ML estimate over the single deletion channel (which can be cast as a discrete optimization problem) is equivalent to solving its relaxation, a continuous optimization problem; 2. We exactly compute the symbolwise posterior distributions (under some assumptions on the priors) for both the single as well as multiple deletion channels. As part of our contributions, we also introduce tools to visualize and analyze error events, which we believe could be useful in other related problems concerning deletion channels.

Information Theory Information Theory

comments

Fetching comments

University of Babylon

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Single-Error Detection and Correction for Duplication and Substitution Channels

Ask ChatGPT about the research

No Arabic abstract

Read More