ترغب بنشر مسار تعليمي؟ اضغط هنا

Partial DNA Assembly: A Rate-Distortion Perspective

158   0   0.0 ( 0 )
 نشر من قبل Govinda Kamath
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization of the perfect assembly formulation to these cases would be to consider a rate-distortion framework, partial assemblies are usually represented in terms of an assembly graph, making the definition of a distortion measure challenging. In this work, we introduce a distortion function for assembly graphs that can be understood as the logarithm of the number of Eulerian cycles in the assembly graph, each of which correspond to a candidate assembly that could have generated the observed reads. We also introduce an algorithm for the construction of an assembly graph and analyze its performance on real genomes.

قيم البحث

اقرأ أيضاً

Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. A standard approach to speed up this task is to compute sketches of the DNA reads (typically via hashing-based tec hniques) that allow the efficient computation of pairwise alignment scores. We propose a rate-distortion framework to study the problem of computing sketches that achieve the optimal tradeoff between sketch size and alignment estimation distortion. We consider the simple setting of i.i.d. error-free sources of length $n$ and introduce a new sketching algorithm called locational hashing. While standard approaches in the literature based on min-hashes require $B = (1/D) cdot Oleft( log n right)$ bits to achieve a distortion $D$, our proposed approach only requires $B = log^2(1/D) cdot O(1)$ bits. This can lead to significant computational savings in pairwise alignment estimation.
This paper takes a rate-distortion approach to understanding the information-theoretic laws governing cache-aided communications systems. Specifically, we characterise the optimal tradeoffs between the delivery rate, cache capacity and reconstruction distortions for a single-user problem and some special cases of a two-user problem. Our analysis considers discrete memoryless sources, expected- and excess-distortion constraints, and separable and f-separable distortion functions. We also establish a strong converse for separable-distortion functions, and we show that los
A rate-distortion problem motivated by the consideration of semantic information is formulated and solved. The starting point is to model an information source as a pair consisting of an intrinsic state which is not observable, corresponding to the s emantic aspect of the source, and an extrinsic observation which is subject to lossy source coding. The proposed rate-distortion problem seeks a description of the information source, via encoding the extrinsic observation, under two distortion constraints, one for the intrinsic state and the other for the extrinsic observation. The corresponding state-observation rate-distortion function is obtained, and a few case studies of Gaussian intrinsic state estimation and binary intrinsic state classification are studied.
The rate-distortion dimension (RDD) of an analog stationary process is studied as a measure of complexity that captures the amount of information contained in the process. It is shown that the RDD of a process, defined as two times the asymptotic rat io of its rate-distortion function $R(D)$ to $log {1over D}$ as the distortion $D$ approaches zero, is equal to its information dimension (ID). This generalizes an earlier result by Kawabata and Dembo and provides an operational approach to evaluate the ID of a process, which previously was shown to be closely related to the effective dimension of the underlying process and also to the fundamental limits of compressed sensing. The relation between RDD and ID is illustrated for a piecewise constant process.
The rate-distortion-perception function (RDPF; Blau and Michaeli, 2019) has emerged as a useful tool for thinking about realism and distortion of reconstructions in lossy compression. Unlike the rate-distortion function, however, it is unknown whethe r encoders and decoders exist that achieve the rate suggested by the RDPF. Building on results by Li and El Gamal (2018), we show that the RDPF can indeed be achieved using stochastic, variable-length codes. For this class of codes, we also prove that the RDPF lower-bounds the achievable rate
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا