ترغب بنشر مسار تعليمي؟ اضغط هنا

The Geometry of the space of Discrete Coalescent Trees

85   0   0.0 ( 0 )
 نشر من قبل Alex Gavryushkin
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Computational inference of dated evolutionary histories relies upon various hypotheses about RNA, DNA, and protein sequence mutation rates. Using mutation rates to infer these dated histories is referred to as molecular clock assumption. Coalescent theory is a popular class of evolutionary models that implements the molecular clock hypothesis to facilitate computational inference of dated phylogenies. Cancer and virus evolution are two areas where these methods are particularly important. Methodologically, phylogenetic inference methods require a tree space over which the inference is performed, and geometry of this space plays an important role in statistical and computational aspects of tree inference algorithms. It has recently been shown that molecular clock, and hence coalescent, trees possess a unique geometry, different from that of classical phylogenetic tree spaces which do not model mutation rates. Here we introduce and study a space of discrete coalescent trees, that is, we assume that time is discrete, which is inevitable in many computational formalisations. We establish several geometrical properties of the space and show how these properties impact various algorithms used in phylogenetic analyses. Our tree space is a discretisation of a known time tree space, called t-space, and hence our results can be used to approximate solutions to various open problems in t-space. Our tree space is also a generalisation of another known trees space, called the ranked nearest neighbour interchange space, hence our advances in this paper imply new and generalise existing results about ranked trees.



قيم البحث

اقرأ أيضاً

Measures of tree balance play an important role in the analysis of phylogenetic trees. One of the oldest and most popular indices in this regard is the Colless index for rooted bifurcating trees, introduced by Colless (1982). While many of its statis tical properties under different probabilistic models for phylogenetic trees have already been established, little is known about its minimum value and the trees that achieve it. In this manuscript, we fill this gap in the literature. To begin with, we derive both recursive and closed expressions for the minimum Colless index of a tree with $n$ leaves. Surprisingly, these expressions show a connection between the minimum Colless index and the so-called Blancmange curve, a fractal curve. We then fully characterize the tree shapes that achieve this minimum value and we introduce both an algorithm to generate them and a recurrence to count them. After focusing on two extremal classes of trees with minimum Colless index (the maximally balanced trees and the greedy from the bottom trees), we conclude by showing that all trees with minimum Colless index also have minimum Sackin index, another popular balance index.
Several implicit methods to infer Horizontal Gene Transfer (HGT) focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of a graph, the later-divergence-t ime (LDT) graph, whose vertices correspond to genes colored by their species. We investigate these graphs in the setting of relaxed scenarios, i.e., evolutionary scenarios that encompass all commonly used variants of duplication-transfer-loss scenarios in the literature. We characterize LDT graphs as a subclass of properly vertex-colored cographs, and provide a polynomial-time recognition algorithm as well as an algorithm to construct a relaxed scenario that explains a given LDT. An edge in an LDT graph implies that the two corresponding genes are separated by at least one HGT event. The converse is not true, however. We show that the complete xenology relation is described by an rs-Fitch graph, i.e., a complete multipartite graph satisfying constraints on the vertex coloring. This class of vertex-colored graphs is also recognizable in polynomial time. We finally address the question how much information about all HGT events is contained in LDT graphs with the help of simulations of evolutionary scenarios with a wide range of duplication, loss, and HGT events. In particular, we show that a simple greedy graph editing scheme can be used to efficiently detect HGT events that are implicitly contained in LDT graphs.
Genome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false posit ives in the reciprocal best match graph. We consider duplication/loss scenarios and characterize unambiguous false-positive (u-fp) orthology assignments, that is, edges in the best match graphs (BMGs) that cannot correspond to orthologs for any gene tree that explains the BMG. Moreover, we provide a polynomial-time algorithm to identify all u-fp orthology assignments in a BMG. Simulations show that at least $75%$ of all incorrect orthology assignments can be detected in this manner. All results rely only on the structure of the BMGs and not on any a priori knowledge about underlying gene or species trees.
We introduce a new Wright-Fisher type model for seed banks incorporating simultaneous switching, which is motivated by recent work on microbial dormancy. We show that the simultaneous switching mechanism leads to a new jump-diffusion limit for the sc aled frequency processes, extending the classical Wright-Fisher and seed bank diffusion limits. We further establish a new dual coalescent structure with multiple activation and deactivation events of lineages. While this seems reminiscent of multiple merger events in general exchangeable coalescents, it actually leads to an entirely new class of coalescent processes with unique qualitative and quantitative behaviour. To illustrate this, we provide a novel kind of condition for coming down from infinity for these coalescents using recent results of Griffiths.
108 - Joseph Heled 2011
We show how to analytically derive the average sequence dissimilarity (ASD) within and between species under a simplified multi-species coalescent setup.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا