ترغب بنشر مسار تعليمي؟ اضغط هنا

On Coding for an Abstracted Nanopore Channel for DNA Storage

103   0   0.0 ( 0 )
 نشر من قبل Mary Wootters
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In the emerging field of DNA storage, data is encoded as DNA sequences and stored. The data is read out again by sequencing the stored DNA. Nanopore sequencing is a new sequencing technology that has many advantages over other methods; in particular, it is cheap, portable, and can support longer reads. While several practical coding schemes have been developed for DNA storage with nanopore sequencing, the theory is not well understood. Towards that end, we study a highly abstracted (deterministic) version of the nanopore sequencer, which highlights key features that make its analysis difficult. We develop methods and theory to understand the capacity of our abstracted model, and we propose efficient coding schemes and algorithms.



قيم البحث

اقرأ أيضاً

In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient rep air calls for erasure codes that in the face of node failure, are efficient in terms of minimizing the amount of repair data transferred over the network, the amount of data accessed at a helper node as well as the number of helper nodes contacted. Coding theory has evolved to handle these challenges by introducing two new classes of erasure codes, namely regenerating codes and locally recoverable codes as well as by coming up with novel ways to repair the ubiquitous Reed-Solomon code. This survey provides an overview of the efforts in this direction that have taken place over the past decade.
Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further b y introducing redundancy in the write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an insertion-deletion-substitution (IDS) channel and design both encoding schemes and low-complexity decoding algorithms for coded trace reconstruction. We introduce Trellis BMA, a new reconstruction algorithm whose complexity is linear in the number of traces, and compare its performance to previous algorithms. Our results show that it reduces the error rate on both simulated and experimental data. The performance comparisons in this paper are based on a new dataset of traces that will be publicly released with the paper. Our hope is that this dataset will enable research progress by allowing objective comparisons between candidate algorithms.
In order to accommodate the ever-growing data from various, possibly independent, sources and the dynamic nature of data usage rates in practical applications, modern cloud data storage systems are required to be scalable, flexible, and heterogeneous . The recent rise of the blockchain technology is also moving various information systems towards decentralization to achieve high privacy at low costs. While codes with hierarchical locality have been intensively studied in the context of centralized cloud storage due to their effectiveness in reducing the average reading time, those for decentralized storage networks (DSNs) have not yet been discussed. In this paper, we propose a joint coding scheme where each node receives extra protection through the cooperation with nodes in its neighborhood in a heterogeneous DSN with any given topology. This work extends and subsumes our prior work on coding for centralized cloud storage. In particular, our proposed construction not only preserves desirable properties such as scalability and flexibility, which are critical in dynamic networks, but also adapts to arbitrary topologies, a property that is essential in DSNs but has been overlooked in existing works.
In large scale distributed storage systems (DSS) deployed in cloud computing, correlated failures resulting in simultaneous failure (or, unavailability) of blocks of nodes are common. In such scenarios, the stored data or a content of a failed node c an only be reconstructed from the available live nodes belonging to the available blocks. To analyze the resilience of the system against such block failures, this work introduces the framework of Block Failure Resilient (BFR) codes, wherein the data (e.g., a file in DSS) can be decoded by reading out from a same number of codeword symbols (nodes) from a subset of available blocks of the underlying codeword. Further, repairable BFR codes are introduced, wherein any codeword symbol in a failed block can be repaired by contacting a subset of remaining blocks in the system. File size bounds for repairable BFR codes are derived, and the trade-off between per node storage and repair bandwidth is analyzed, and the corresponding minimum storage regenerating (BFR-MSR) and minimum bandwidth regenerating (BFR-MBR) points are derived. Explicit codes achieving the two operating points for a special case of parameters are constructed, wherein the underlying regenerating codewords are distributed to BFR codeword symbols according to combinatorial designs. Finally, BFR locally repairable codes (BFR-LRC) are introduced, an upper bound on the resilience is derived and optimal code construction are provided by a concatenation of Gabidulin and MDS codes. Repair efficiency of BFR-LRC is further studied via the use of BFR-MSR/MBR codes as local codes. Code constructions achieving optimal resilience for BFR-MSR/MBR-LRCs are provided for certain parameter regimes. Overall, this work introduces the framework of block failures along with optimal code constructions, and the study of architecture-aware coding for distributed storage systems.
We construct a joint coordination-channel polar coding scheme for strong coordination of actions between two agents $mathsf X$ and $mathsf Y$, which communicate over a discrete memoryless channel (DMC) such that the joint distribution of actions foll ows a prescribed probability distribution. We show that polar codes are able to achieve our previously established inner bound to the strong noisy coordination capacity region and thus provide a constructive alternative to a random coding proof. Our polar coding scheme also offers a constructive solution to a channel simulation problem where a DMC and shared randomness are together employed to simulate another DMC. In particular, our proposed solution is able to utilize the randomness of the DMC to reduce the amount of local randomness required to generate the sequence of actions at agent $mathsf Y$. By leveraging our earlier random coding results for this problem, we conclude that the proposed joint coordination-channel coding scheme strictly outperforms a separate scheme in terms of achievable communication rate for the same amount of injected randomness into both systems.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا