No Arabic abstract
In phylogenetics it is of interest for rate matrix sets to satisfy closure under matrix multiplication as this makes finding the set of corresponding transition matrices possible without having to compute matrix exponentials. It is also advantageous to have a small number of free parameters as this, in applications, will result in a reduction of computation time. We explore a method of building a rate matrix set from a rooted tree structure by assigning rates to internal tree nodes and states to the leaves, then defining the rate of change between two states as the rate assigned to the most recent common ancestor of those two states. We investigate the properties of these matrix sets from both a linear algebra and a graph theory perspective and show that any rate matrix set generated this way is closed under matrix multiplication. The consequences of setting two rates assigned to internal tree nodes to be equal are then considered. This methodology could be used to develop parameterised models of amino acid substitution which have a small number of parameters but convey biological meaning.
Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairs of leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees factor through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems.
This short note provides a simple formal proof of a folklore result in statistical phylogenetics concerning the convergence of bootstrap support for a tree and its edges.
Effects like selection in evolution as well as fertility inheritance in the development of populations can lead to a higher degree of asymmetry in evolutionary trees than expected under a null hypothesis. To identify and quantify such influences, various balance indices were proposed in the phylogenetic literature and have been in use for decades. However, so far no balance index was based on the number of emph{symmetry nodes}, even though symmetry nodes play an important role in other areas of mathematical phylogenetics and despite the fact that symmetry nodes are a quite natural way to measure balance or symmetry of a given tree. The aim of this manuscript is thus twofold: First, we will introduce the emph{symmetry nodes index} as an index for measuring balance of phylogenetic trees and analyze its extremal properties. We also show that this index can be calculated in linear time. This new index turns out to be a generalization of a simple and well-known balance index, namely the emph{cherry index}, as well as a specialization of another, less established, balance index, namely emph{Rogers $J$ index}. Thus, it is the second objective of the present manuscript to compare the new symmetry nodes index to these two indices and to underline its advantages. In order to do so, we will derive some extremal properties of the cherry index and Rogers $J$ index along the way and thus complement existing studies on these indices. Moreover, we used the programming language textsf{R} to implement all three indices in the software package textsf{symmeTree}, which has been made publicly available.
Covarion models of character evolution describe inhomogeneities in substitution processes through time. In phylogenetics, such models are used to describe changing functional constraints or selection regimes during the evolution of biological sequences. In this work the identifiability of such models for generic parameters on a known phylogenetic tree is established, provided the number of covarion classes does not exceed the size of the observable state space. `Generic parameters as used here means all parameters except possibly those in a set of measure zero within the parameter space. Combined with earlier results, this implies both the tree and generic numerical parameters are identifiable if the number of classes is strictly smaller than the number of observable states.
The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.