No Arabic abstract
Tree-child networks, one of the prominent network classes in phylogenetics, have been introduced for the purpose of modeling reticulate evolution. Recently, the first author together with Gittenberger and Mansouri (2019) showed that the number ${rm TC}_{ell,k}$ of tree-child networks with $ell$ leaves and $k$ reticulation vertices has the first-order asymptotics [ {rm TC}_{ell,k}sim c_kleft(frac{2}{e}right)^{ell}ell^{ell+2k-1},qquad (ellrightarrowinfty). ] Moreover, they also computed $c_k$ for $k=1,2,$ and $3$. In this short note, we give a second approach to the above result which is based on a recent (algorithmic) approach for the counting of tree-child networks due to Cardona and Zhang (2020). This second approach is also capable of giving a simple, closed-form expression for $c_k$ for all $kgeq 0$.
In previous work, we gave asymptotic counting results for the number of tree-child and normal networks with $k$ reticulation vertices and explicit exponential generating functions of the counting sequences for $k=1,2,3$. The purpose of this note is two-fold. First, we make some corrections to our previous approach which overcounted the above numbers and thus gives erroneous exponential generating functions (however, the overcounting does not effect our asymptotic counting results). Secondly, we use our (corrected) exponential generating functions to derive explicit formulas for the number of tree-child and normal networks with $k=1,2,3$ reticulation vertices. This re-derives recent results of Carona and Zhang, answers their question for normal networks with $k=2$, and adds new formulas in the case $k=3$.
Reticulate evolutionary processes result in phylogenetic histories that cannot be modeled using a tree topology. Here, we apply methods from topological data analysis to molecular sequence data with reticulations. Using a simple example, we demonstrate the correspondence between nontrivial higher homology and reticulate evolution. We discuss the sensitivity of the standard filtration and show cases where reticulate evolution can fail to be detected. We introduce an extension of the standard framework and define the median complex as a construction to recover signal of the frequency and scale of reticulate evolution by inferring and imputing putative ancestral states. Finally, we apply our methods to two datasets from phylogenetics. Our work expands on earlier ideas of using topology to extract important evolutionary features from genomic data.
We provide a simple characterization of simplicial complexes on few vertices that embed into the $d$-sphere. Namely, a simplicial complex on $d+3$ vertices embeds into the $d$-sphere if and only if its non-faces do not form an intersecting family. As immediate consequences, we recover the classical van Kampen--Flores theorem and provide a topological extension of the ErdH os--Ko--Rado theorem. By analogy with Farys theorem for planar graphs, we show in addition that such complexes satisfy the rigidity property that continuous and linear embeddability are equivalent.
Rooted phylogenetic networks provide a more complete representation of the ancestral relationship between species than phylogenetic trees when reticulate evolutionary processes are at play. One way to reconstruct a phylogenetic network is to consider its `ancestral profile (the number of paths from each ancestral vertex to each leaf). In general, this information does not uniquely determine the underlying phylogenetic network. A recent paper considered a new class of phylogenetic networks called `orchard networks where this uniqueness was claimed to hold. Here we show that an additional restriction on the network, that of being `stack-free, is required in order for the original uniqueness claim to hold. On the other hand, if the additional stack-free restriction is lifted, we establish an alternative result; namely, there is uniqueness within the class of orchard networks up to the resolution of vertices of high in-degree.
In the study of rooted phylogenetic networks, analyzing the set of rooted phylogenetic trees that are embedded in such a network is a recurring task. From an algorithmic viewpoint, this analysis almost always requires an exhaustive search of a particular multiset $S$ of rooted phylogenetic trees that are embedded in a rooted phylogenetic network $mathcal{N}$. Since the size of $S$ is exponential in the number of reticulations of $mathcal{N}$, it is consequently of interest to keep this number as small as possible but without loosing any element of $S$. In this paper, we take a first step towards this goal by introducing the notion of a non-essential arc of $mathcal{N}$, which is an arc whose deletion from $mathcal{N}$ results in a rooted phylogenetic network $mathcal{N}$ such that the sets of rooted phylogenetic trees that are embedded in $mathcal{N}$ and $mathcal{N}$ are the same. We investigate the popular class of tree-child networks and characterize which arcs are non-essential. This characterization is based on a family of directed graphs. Using this novel characterization, we show that identifying and deleting all non-essential arcs in a tree-child network takes time that is cubic in the number of leaves of the network. Moreover, we show that deciding if a given arc of an arbitrary phylogenetic network is non-essential is $Pi_2^P$-complete.