ترغب بنشر مسار تعليمي؟ اضغط هنا

A Linear-Time Algorithm for the Common Refinement of Rooted Phylogenetic Trees on a Common Leaf Set

93   0   0.0 ( 0 )
 نشر من قبل David Schaller
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The problem of finding a common refinement of a set of rooted trees with common leaf set $L$ appears naturally in mathematical phylogenetics whenever poorly resolved information on the same taxa from different sources is to be reconciled. This constitutes a special case of the well-studied supertree problem, where the leaf sets of the input trees may differ. Algorithms that solve the rooted tree compatibility problem are of course applicable to this special case. However, they require sophisticated auxiliary data structures and have a running time of at least $O(k|L|log^2(k|L|))$ for $k$ input trees. Here, we show that the problem can be solved in $O(k|L|)$ time using a simple bottom-up algorithm called LinCR. An implementation of LinCR in Python is freely available at https://github.com/david-schaller/tralda.



قيم البحث

اقرأ أيضاً

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length $n$. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an ex tensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS15] and Bringmann and Kunnemann [FOCS15] who proved that there is no subquadratic algorithm unless the Strong Exponential Time Hypothesis fails. This has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting, only a naive $O(n^{varepsilon/2})$-approximation algorithm with running time $tilde{O}(n^{2-varepsilon})$ has been known, for any constant $0 < varepsilon le 1$. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA19] provided a linear-time algorithm that yields a $O(n^{0.497956})$-approximation in expectation; improving upon the naive $O(sqrt{n})$-approximation for the first time. In this paper, we provide an algorithm that in time $O(n^{2-varepsilon})$ computes an $tilde{O}(n^{2varepsilon/5})$-approximation with high probability, for any $0 < varepsilon le 1$. Our result (1) gives an $tilde{O}(n^{0.4})$-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time $O(n^{2-varepsilon})$, improving upon the naive bound of $O(n^{varepsilon/2})$ for any $varepsilon$, and (3) instead of only in expectation, succeeds with high probability.
120 - Xu Yang , Hong Qiao , 2014
We propose a weighted common subgraph (WCS) matching algorithm to find the most similar subgraphs in two labeled weighted graphs. WCS matching, as a natural generalization of the equal-sized graph matching or subgraph matching, finds wide application s in many computer vision and machine learning tasks. In this paper, the WCS matching is first formulated as a combinatorial optimization problem over the set of partial permutation matrices. Then it is approximately solved by a recently proposed combinatorial optimization framework - Graduated NonConvexity and Concavity Procedure (GNCCP). Experimental comparisons on both synthetic graphs and real world images validate its robustness against noise level, problem size, outlier number, and edge density.
In the Directed Feedback Vertex Set (DFVS) problem, the input is a directed graph $D$ on $n$ vertices and $m$ edges, and an integer $k$. The objective is to determine whether there exists a set of at most $k$ vertices intersecting every directed cycl e of $D$. Whether or not DFVS admits a fixed parameter tractable (FPT) algorithm was considered the most important open problem in parameterized complexity until Chen, Liu, Lu, OSullivan and Razgon [JACM 2008] answered the question in the affirmative. They gave an algorithm for the problem with running time $O(k!4^kk^4nm)$. Since then, no faster algorithm for the problem has been found. In this paper, we give an algorithm for DFVS with running time $O(k!4^kk^5(n+m))$. Our algorithm is the first algorithm for DFVS with linear dependence on input size. Furthermore, the asymptotic dependence of the running time of our algorithm on the parameter $k$ matches up to a factor $k$ the algorithm of Chen, Liu, Lu, OSullivan and Razgon. On the way to designing our algorithm for DFVS, we give a general methodology to shave off a factor of $n$ from iterative-compression based algorithms for a few other well-studied covering problems in parameterized complexity. We demonstrate the applicability of this technique by speeding up by a factor of $n$, the current best FPT algorithms for Multicut [STOC 2011, SICOMP 2014] and Directed Subset Feedback Vertex Set [ICALP 2012, TALG 2014].
175 - Jin Cao , Dewei Zhong 2020
Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ stri ngs is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {em Random-MCS} often yields a solution to $LCS$.
Given a graph $G=(V,E)$ and an integer $k ge 1$, a $k$-hop dominating set $D$ of $G$ is a subset of $V$, such that, for every vertex $v in V$, there exists a node $u in D$ whose hop-distance from $v$ is at most $k$. A $k$-hop dominating set of minimu m cardinality is called a minimum $k$-hop dominating set. In this paper, we present linear-time algorithms that find a minimum $k$-hop dominating set in unicyclic and cactus graphs. To achieve this, we show that the $k$-dominating set problem on unicycle graph reduces to the piercing circular arcs problem, and show a linear-time algorithm for piercing sorted circular arcs, which improves the best known $O(nlog n)$-time algorithm.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا