Do you want to publish a course? Click here

Complexity of modification problems for best match graphs

176   0   0.0 ( 0 )
 Added by David Schaller
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Best match graphs (BMGs) are vertex-colored directed graphs that were introduced to model the relationships of genes (vertices) from different species (colors) given an underlying evolutionary tree that is assumed to be unknown. In real-life applications, BMGs are estimated from sequence similarity data. Measurement noise and approximation errors usually result in empirically determined graphs that in general violate characteristic properties of BMGs. The arc modification problems for BMGs aim at correcting such violations and thus provide a means to improve the initial estimates of best match data. We show here that the arc deletion, arc completion and arc editing problems for BMGs are NP-complete and that they can be formulated and solved as integer linear programs. To this end, we provide a novel characterization of BMGs in terms of triples (binary trees on three leaves) and a characterization of BMGs with two colors in terms of forbidden subgraphs.



rate research

Read More

Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics and can be approximated with the help of similarity measures between gene sequences, albeit not without errors. The corresponding graph editing problem can be used as a means of error correction. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Since BMGs have a characterization in terms of consistency of a certain set of rooted triples, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Ahos supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing.
Best match graphs (BMGs) are vertex-colored digraphs that naturally arise in mathematical phylogenetics to formalize the notion of evolutionary closest genes w.r.t. an a priori unknown phylogenetic tree. BMGs are explained by unique least resolved trees. We prove that the property of a rooted, leaf-colored tree to be least resolved for some BMG is preserved by the contraction of inner edges. For the special case of two-colored BMGs, this leads to a characterization of the least resolved trees (LRTs) of binary-explainable trees and a simple, polynomial-time algorithm for the minimum cardinality completion of the arc set of a BMG to reach a BMG that can be explained by a binary tree.
Genome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false positives in the reciprocal best match graph. We consider duplication/loss scenarios and characterize unambiguous false-positive (u-fp) orthology assignments, that is, edges in the best match graphs (BMGs) that cannot correspond to orthologs for any gene tree that explains the BMG. Moreover, we provide a polynomial-time algorithm to identify all u-fp orthology assignments in a BMG. Simulations show that at least $75%$ of all incorrect orthology assignments can be detected in this manner. All results rely only on the structure of the BMGs and not on any a priori knowledge about underlying gene or species trees.
The even cycle problem for both undirected and directed graphs has been the topic of intense research in the last decade. In this paper, we study the computational complexity of emph{cycle length modularity problems}. Roughly speaking, in a cycle length modularity problem, given an input (undirected or directed) graph, one has to determine whether the graph has a cycle $C$ of a specific length (or one of several different lengths), modulo a fixed integer. We denote the two families (one for undirected graphs and one for directed graphs) of problems by $(S,m)hbox{-}{rm UC}$ and $(S,m)hbox{-}{rm DC}$, where $m in mathcal{N}$ and $S subseteq {0,1, ..., m-1}$. $(S,m)hbox{-}{rm UC}$ (respectively, $(S,m)hbox{-}{rm DC}$) is defined as follows: Given an undirected (respectively, directed) graph $G$, is there a cycle in $G$ whose length, modulo $m$, is a member of $S$? In this paper, we fully classify (i.e., as either polynomial-time solvable or as ${rm NP}$-complete) each problem $(S,m)hbox{-}{rm UC}$ such that $0 in S$ and each problem $(S,m)hbox{-}{rm DC}$ such that $0 otin S$. We also give a sufficient condition on $S$ and $m$ for the following problem to be polynomial-time computable: $(S,m)hbox{-}{rm UC}$ such that $0 otin S$.
Best match graphs (BMG) are a key intermediate in graph-based orthology detection and contain a large amount of information on the gene tree. We provide a near-cubic algorithm to determine whether a BMG is binary-explainable, i.e., whether it can be explained by a fully resolved gene tree and, if so, to construct such a tree. Moreover, we show that all such binary trees are refinements of the unique binary-resolvable tree (BRT), which in general is a substantial refinement of the also unique least resolved tree of a BMG. Finally, we show that the problem of editing an arbitrary vertex-colored graph to a binary-explainable BMG is NP-complete and provide an integer linear program formulation for this task.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا