No Arabic abstract
We present a computational model to reconstruct trees of ancestors for animals with sexual reproduction. Through a recursive algorithm combined with a random number generator, it is possible to reproduce the number of ancestors for each generation and use it to constraint the maximum number of the following generation. This new model allows to consider the reproductive preferences of particular species and combine several trees to simulate the behavior of a population. It is also possible to obtain a description analytically, considering the simulation as a theoretical stochastic process. Such process can be generalized in order to use an algorithm associated with it to simulate other similar processes of stochastic nature. The simulation is based in the theoretical model previously presented before.
Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive computation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa $n$. For fixed $n$, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with $k_0^n$, where $k_0 approx 1.5028$ is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with $sqrt{3/2}(4/3)^n$ and the variance with approximately $1.4048(1.8215)^n$. The results provide a contribution to the combinatorial study of gene trees and species trees.
The Minimal Ancestral Deviation (MAD) method is a recently introduced procedure for estimating the root of a phylogenetic tree, based only on the shape and branch lengths of the tree. The method is loosely derived from the midpoint rooting method, but, unlike its predecessor, makes use of all pairs of OTUs when positioning the root. In this note we establish properties of this method and then describe a fast and memory efficient algorithm. As a proof of principle, we use our algorithm to determine the MAD roots for simulated phylogenies with up to 100,000 OTUs. The calculations take a few minutes on a standard laptop.
2-colored best match graphs (2-BMGs) form a subclass of sink-free bi-transitive graphs that appears in phylogenetic combinatorics. There, 2-BMGs describe evolutionarily most closely related genes between a pair of species. They are explained by a unique least resolved tree (LRT). Introducing the concept of support vertices we derive an $O(|V|+|E|log^2|V|)$-time algorithm to recognize 2-BMGs and to construct its LRT. The approach can be extended to also recognize binary-explainable 2-BMGs with the same complexity. An empirical comparison emphasizes the efficiency of the new algorithm.
Significant phylogenetic codivergence between plant or animal hosts ($H$) and their symbionts or parasites ($P$) indicate the importance of their interactions on evolutionary time scales. However, valid and realistic methods to test for codivergence are not fully developed. One of the systems where possible codivergence has been of interest involves the large subfamily of temperate grasses (Pooideae) and their endophytic fungi (epichloae). These widespread symbioses often help protect host plants from herbivory and stresses, and affect species diversity and food web structures. Here we introduce the MRCALink (most-recent-common-ancestor link) method and use it to investigate the possibility of grass-epichloe codivergence. MRCALink applied to ultrametric $H$ and $P$ trees identifies all corresponding nodes for pairwise comparisons of MRCA ages. The result is compared to the space of random $H$ and $P$ tree pairs estimated by a Monte Carlo method. Compared to tree reconciliation the method is less dependent on tree topologies (which often can be misleading), and it crucially improves on phylogeny-independent methods such as {tt ParaFit} or the Mantel test by eliminating an extreme (but previously unrecognized) distortion of node-pair sampling. Analysis of 26 grass species-epichloe species symbioses did not reject random association of $H$ and $P$ MRCA ages. However, when five obvious host jumps were removed the analysis significantly rejected random association and supported grass-endophyte codivergence. Interestingly, early cladogenesis events in the Pooideae corresponded to early cladogenesis events in epichloae, suggesting concomitant origins of this grass subfamily and its remarkable group of symbionts. We also applied our method to the well-known gopher-louse data set.
Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a $2$-state symmetric model for a single binary trait and investigate the theoretical properties of the MLE for the transition rate in the large-tree limit. Here, the large-tree limit is a theoretical scenario where the number of taxa increases to infinity and we can observe the trait values for all species. Specifically, we prove that the MLE converges to the true value under some regularity conditions. These conditions ensure that the tree shape is not too irregular, and holds for many practical scenarios such as trees with bounded edges, trees generated from the Yule (pure birth) process, and trees generated from the coalescent point process. Our result also provides an upper bound for the distance between the MLE and the true value.