No Arabic abstract
The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only
The computation of genomic distances has been a very active field of computational comparative genomics over the last 25 years. Substantial results include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995 and the introduction of the double-cut and join (DCJ) distance by Yancopoulos et al. in 2005. Both results, however, rely on the assumption that the genomes under comparison contain the same set of unique markers (syntenic genomic regions, sometimes also referred to as genes). In 2015, Shao, Lin and Moret relax this condition by allowing for duplicate markers in the analysis. This generalized version of the genomic distance problem is NP-hard, and they give an ILP solution that is efficient enough to be applied to real-world datasets. A restriction of their approach is that it can be applied only to balanced genomes, that have equal numbers of duplicates of any marker. Therefore it still needs a delicate preprocessing of the input data in which excessive copies of unbalanced markers have to be removed. In this paper we present an algorithm solving the genomic distance problem for natural genomes, in which any marker may occur an arbitrary number of times. Our method is based on a new graph data structure, the multi-relational diagram, that allows an elegant extension of the ILP by Shao, Lin and Moret to count runs of markers that are under- or over-represented in one genome with respect to the other and need to be inserted or deleted, respectively. With this extension, previous restrictions on the genome configurations are lifted, for the first time enabling an uncompromising rearrangement analysis. Any marker sequence can directly be used for the distance calculation. The evaluation of our approach shows that it can be used to analyze genomes with up to a few ten thousand markers, which we demonstrate on simulated and real data.
Tree comparison metrics have proven to be an invaluable aide in the reconstruction and analysis of phylogenetic (evolutionary) trees. The path-length distance between trees is a particularly attractive measure as it reflects differences in tree shape as well as differences between branch lengths. The distance equals the sum, over all pairs of taxa, of the squared differences between the lengths of the unique path connecting them in each tree. We describe an $O(n log n)$ time for computing this distance, making extensive use of tree decomposition techniques introduced by Brodal et al. (2004).
With this paper we bring about a discussion on the computing potential of complex optical networks and provide experimental demonstration that an optical fiber network can be used as an analog processor to calculate matrix inversion. A 3x3 matrix is inverted as a proof-of-concept demonstration using a fiber network containing three nodes and operating at telecomm wavelength. For an NxN matrix, the overall solving time (including setting time of the matrix elements and calculation time of inversion) scales as O(N^2), whereas matrix inversion by most advanced computer algorithms requires ~O(N^2.37) computational time. For well-conditioned matrices, the error of the inversion performed optically is found to be less than 3%, limited by the accuracy of measurement equipment.
Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any $delta > 0$, given streaming access to an array of length $n$ provides a $(1+delta)$-multiplicative approximation to the emph{distance to monotonicity} ($n$ minus the length of the LIS), and uses only $O((log^2 n)/delta)$ space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. Our algorithm can be used to estimate the length of the LIS to within an additive $delta n$ for any $delta >0$ while previous algorithms could only achieve additive error $n(1/2-o(1))$. Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also give a streaming algorithm for approximating $LCS(x,y)$, the length of the longest common subsequence between strings $x$ and $y$, each of length $n$. Our algorithm works in the asymmetric setting (inspired by cite{AKO10}), in which we have random access to $y$ and streaming access to $x$, and runs in small space provided that no single symbol appears very often in $y$. More precisely, it gives an additive-$delta n$ approximation to $LCS(x,y)$ (and hence also to $E(x,y) = n-LCS(x,y)$, the edit distance between $x$ and $y$ when insertions and deletions, but not substitutions, are allowed), with space complexity $O(k(log^2 n)/delta)$, where $k$ is the maximum number of times any one symbol appears in $y$.
A bond of a graph $G$ is an inclusion-wise minimal disconnecting set of $G$, i.e., bonds are cut-sets that determine cuts $[S,Vsetminus S]$ of $G$ such that $G[S]$ and $G[Vsetminus S]$ are both connected. Given $s,tin V(G)$, an $st$-bond of $G$ is a bond whose removal disconnects $s$ and $t$. Contrasting with the large number of studies related to maximum cuts, there are very few results regarding the largest bond of general graphs. In this paper, we aim to reduce this gap on the complexity of computing the largest bond and the largest $st$-bond of a graph. Although cuts and bonds are similar, we remark that computing the largest bond of a graph tends to be harder than computing its maximum cut. We show that {sc Largest Bond} remains NP-hard even for planar bipartite graphs, and it does not admit a constant-factor approximation algorithm, unless $P = NP$. We also show that {sc Largest Bond} and {sc Largest $st$-Bond} on graphs of clique-width $w$ cannot be solved in time $f(w)times n^{o(w)}$ unless the Exponential Time Hypothesis fails, but they can be solved in time $f(w)times n^{O(w)}$. In addition, we show that both problems are fixed-parameter tractable when parameterized by the size of the solution, but they do not admit polynomial kernels unless NP $subseteq$ coNP/poly.