ﻻ يوجد ملخص باللغة العربية
Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. A standard approach to speed up this task is to compute sketches of the DNA reads (typically via hashing-based techniques) that allow the efficient computation of pairwise alignment scores. We propose a rate-distortion framework to study the problem of computing sketches that achieve the optimal tradeoff between sketch size and alignment estimation distortion. We consider the simple setting of i.i.d. error-free sources of length $n$ and introduce a new sketching algorithm called locational hashing. While standard approaches in the literature based on min-hashes require $B = (1/D) cdot Oleft( log n right)$ bits to achieve a distortion $D$, our proposed approach only requires $B = log^2(1/D) cdot O(1)$ bits. This can lead to significant computational savings in pairwise alignment estimation.
Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very ofte
This paper takes a rate-distortion approach to understanding the information-theoretic laws governing cache-aided communications systems. Specifically, we characterise the optimal tradeoffs between the delivery rate, cache capacity and reconstruction
A rate-distortion problem motivated by the consideration of semantic information is formulated and solved. The starting point is to model an information source as a pair consisting of an intrinsic state which is not observable, corresponding to the s
The rate-distortion dimension (RDD) of an analog stationary process is studied as a measure of complexity that captures the amount of information contained in the process. It is shown that the RDD of a process, defined as two times the asymptotic rat
The rate-distortion-perception function (RDPF; Blau and Michaeli, 2019) has emerged as a useful tool for thinking about realism and distortion of reconstructions in lossy compression. Unlike the rate-distortion function, however, it is unknown whethe