ترغب بنشر مسار تعليمي؟ اضغط هنا

Single-Error Detection and Correction for Duplication and Substitution Channels

111   0   0.0 ( 0 )
 نشر من قبل Yonatan Yehezkeally
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitutions is unrestricted. Both error-detecting and error-correcting codes are constructed, which can handle correctly any number of tandem duplications of a fixed length $k$, and at most a single substitution occurring at any time during the mutation process.



قيم البحث

اقرأ أيضاً

Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and substitution errors. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one substitution error. Because a substituted symbol can be duplicated many times (as part of substrings of various lengths), a single substitution can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional substitution is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., ACGT to ACGACGT, where k = 3, while a noisy duplication inserts a copy suffering from substitution noise, e.g., ACGT to ACGATGT. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.
155 - Boris Ryabko 2020
We describe and explore so-called linear hash functions and show how they can be used to build error detection and correction codes. The method can be applied for different types of errors (for example, burst errors). When the method is applied to a model where number of distorted letters is limited, the obtained estimate of its performance is slightly better than the known Varshamov-Gilbert bound. We also describe random code whose performance is close to the same boundary, but its construction is much simpler. In some cases the obtained methods are simpler and more flexible than the known ones. In particular, the complexity of the obtained error detection code and the well-known CRC code is close, but the proposed code, unlike CRC, can detect with certainty errors whose number does not exceed a predetermined limit.
Decreasing transistor sizes and lower voltage swings cause two distinct problems for communication in integrated circuits. First, decreasing inter-wire spacing increases interline capacitive coupling, which adversely affects transmission energy and d elay. Second, lower voltage swings render the transmission susceptible to various noise sources. Coding can be used to address both these problems. So-called crosstalk-avoidance codes mitigate capacitive coupling, and traditional error-correction codes introduce resilience against channel errors. Unfortunately, crosstalk-avoidance and error-correction codes cannot be combined in a straightforward manner. On the one hand, crosstalk-avoidance encoding followed by error-correction encoding destroys the crosstalk-avoidance property. On the other hand, error-correction encoding followed by crosstalk-avoidance encoding causes the crosstalk-avoidance decoder to fail in the presence of errors. Existing approaches circumvent this difficulty by using additional bus wires to protect the parities generated from the output of the error-correction encoder, and are therefore inefficient. In this work we propose a novel joint crosstalk-avoidance and error-correction coding and decoding scheme that provides higher bus transmission rates compared to existing approaches. Our joint approach carefully embeds the parities such that the crosstalk-avoidance property is preserved. We analyze the rate and minimum distance of the proposed scheme. We also provide a density evolution analysis and predict iterative decoding thresholds for reliable communication under random bus erasures. This density evolution analysis is nonstandard, since the crosstalk-avoidance constraints are inherently nonlinear.
Recent advances in DNA sequencing technology and DNA storage systems have rekindled the interest in deletion channels. Multiple recent works have looked at variants of sequence reconstruction over a single and over multiple deletion channels, a notor iously difficult problem due to its highly combinatorial nature. Although works in theoretical computer science have provided algorithms which guarantee perfect reconstruction with multiple independent observations from the deletion channel, they are only applicable in the large blocklength regime and more restrictively, when the number of observations is also large. Indeed, with only a few observations, perfect reconstruction of the input sequence may not even be possible in most cases. In such situations, maximum likelihood (ML) and maximum aposteriori (MAP) estimates for the deletion channels are natural questions that arise and these have remained open to the best of our knowledge. In this work, we take steps to answer the two aforementioned questions. Specifically: 1. We show that solving for the ML estimate over the single deletion channel (which can be cast as a discrete optimization problem) is equivalent to solving its relaxation, a continuous optimization problem; 2. We exactly compute the symbolwise posterior distributions (under some assumptions on the priors) for both the single as well as multiple deletion channels. As part of our contributions, we also introduce tools to visualize and analyze error events, which we believe could be useful in other related problems concerning deletion channels.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا