No Arabic abstract
Stochastic models of evolution (Markov random fields on trivalent trees) generally assume that different characters (different runs of the stochastic process) are independent and identically distributed. In this paper we take the first steps towards addressing dependent characters. Specifically we show that, under certain technical assumptions regarding the evolution of individual characters, we can detect any significant, history independent, correlation between any pair of multistate characters. For the special case of the Cavender-Farris-Neyman (CFN) model on two states with symmetric transition matrices, our analysis needs milder assumptions. To perform the analysis, we need to prove a new concentration result for multistate random variables of a Markov random field on arbitrary trivalent trees: we show that the random variable counting the number of leaves in any particular subset of states has variance that is subquadratic in the number of leaves.
The puzzle presented by the famous stumps of Gilboa, New York, finds a solution in the discovery of two fossil specimens that allow the entire structure of these early trees to be reconstructed.
We consider a biased random walk $X_n$ on a Galton-Watson tree with leaves in the sub-ballistic regime. We prove that there exists an explicit constant $gamma= gamma(beta) in (0,1)$, depending on the bias $beta$, such that $X_n$ is of order $n^{gamma}$. Denoting $Delta_n$ the hitting time of level $n$, we prove that $Delta_n/n^{1/gamma}$ is tight. Moreover we show that $Delta_n/n^{1/gamma}$ does not converge in law (at least for large values of $beta$). We prove that along the sequences $n_{lambda}(k)=lfloor lambda beta^{gamma k}rfloor$, $Delta_n/n^{1/gamma}$ converges to certain infinitely divisible laws. Key tools for the proof are the classical Harris decomposition for Galton-Watson trees, a new variant of regeneration times and the careful analysis of triangular arrays of i.i.d. heavy-tailed random variables.
In a recent paper, Klaere et al. modeled the impact of substitutions on arbitrary branches of a phylogenetic tree on an alignment site by the so-called One Step Mutation (OSM) matrix. By utilizing the concept of the OSM matrix for the four-state nucleotide alphabet, Nguyen et al. presented an efficient procedure to compute the minimal number of substitutions needed to translate one alignment site into another.The present paper delivers a proof for this computation.Moreover, we provide several mathematical insights into the generalization of the OSM matrix to multistate alphabets.The construction of the OSM matrix is only possible if the matrices representing the substitution types acting on the character states and the identity matrix form a commutative group with respect to matrix multiplication. We illustrate a means to establish such a group for the twenty-state amino acid alphabet and critically discuss its biological usefulness.
Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree `shapes. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of $k$ such characters, as we show. For $k=2$, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters grows. However, again there is a twist: MP trees on six taxa are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts, and this difference does not appear to dissipate as $k$ grows.
We present an efficient and flexible method for computing likelihoods of phenotypic traits on a phylogeny. The method does not resort to Monte-Carlo computation but instead blends Felsensteins discrete character pruning algorithm with methods for numerical quadrature. It is not limited to Gaussian models and adapts readily to model uncertainty in the observed trait values. We demonstrate the framework by developing efficient algorithms for likelihood calculation and ancestral state reconstruction under Wrights threshold model, applying our methods to a dataset of trait data for extrafloral nectaries (EFNs) across a phylogeny of 839 Labales species.