No Arabic abstract
In the study of rooted phylogenetic networks, analyzing the set of rooted phylogenetic trees that are embedded in such a network is a recurring task. From an algorithmic viewpoint, this analysis almost always requires an exhaustive search of a particular multiset $S$ of rooted phylogenetic trees that are embedded in a rooted phylogenetic network $mathcal{N}$. Since the size of $S$ is exponential in the number of reticulations of $mathcal{N}$, it is consequently of interest to keep this number as small as possible but without loosing any element of $S$. In this paper, we take a first step towards this goal by introducing the notion of a non-essential arc of $mathcal{N}$, which is an arc whose deletion from $mathcal{N}$ results in a rooted phylogenetic network $mathcal{N}$ such that the sets of rooted phylogenetic trees that are embedded in $mathcal{N}$ and $mathcal{N}$ are the same. We investigate the popular class of tree-child networks and characterize which arcs are non-essential. This characterization is based on a family of directed graphs. Using this novel characterization, we show that identifying and deleting all non-essential arcs in a tree-child network takes time that is cubic in the number of leaves of the network. Moreover, we show that deciding if a given arc of an arbitrary phylogenetic network is non-essential is $Pi_2^P$-complete.
Rooted phylogenetic networks provide a more complete representation of the ancestral relationship between species than phylogenetic trees when reticulate evolutionary processes are at play. One way to reconstruct a phylogenetic network is to consider its `ancestral profile (the number of paths from each ancestral vertex to each leaf). In general, this information does not uniquely determine the underlying phylogenetic network. A recent paper considered a new class of phylogenetic networks called `orchard networks where this uniqueness was claimed to hold. Here we show that an additional restriction on the network, that of being `stack-free, is required in order for the original uniqueness claim to hold. On the other hand, if the additional stack-free restriction is lifted, we establish an alternative result; namely, there is uniqueness within the class of orchard networks up to the resolution of vertices of high in-degree.
Rooted phylogenetic networks provide an explicit representation of the evolutionary history of a set $X$ of sampled species. In contrast to phylogenetic trees which show only speciation events, networks can also accommodate reticulate processes (for example, hybrid evolution, endosymbiosis, and lateral gene transfer). A major goal in systematic biology is to infer evolutionary relationships, and while phylogenetic trees can be uniquely determined from various simple combinatorial data on $X$, for networks the reconstruction question is much more subtle. Here we ask when can a network be uniquely reconstructed from its `ancestral profile (the number of paths from each ancestral vertex to each element in $X$). We show that reconstruction holds (even within the class of all networks) for a class of networks we call `orchard networks, and we provide a polynomial-time algorithm for reconstructing any orchard network from its ancestral profile. Our approach relies on establishing a structural theorem for orchard networks, which also provides for a fast (polynomial-time) algorithm to test if any given network is of orchard type. Since the class of orchard networks includes tree-sibling tree-consistent networks and tree-child networks, our result generalise reconstruction results from 2008 and 2009. Orchard networks allow for an unbounded number $k$ of reticulation vertices, in contrast to tree-sibling tree-consistent networks and tree-child networks for which $k$ is at most $2|X|-4$ and $|X|-1$, respectively.
Because biological processes can make different loci have different evolutionary histories, species tree estimation requires multiple loci from across the genome. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called summary methods. Because summary methods are generally fast, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate on biologically realistic conditions. Mirarab et al. (Science 2014) presented the statistical binning technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple statistical test for combinability and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomics pipeline does not have the desirable property of being statistically consistent. We show that weighting the recalculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, weighted statistical binning enables highly accurate genome-scale species tree estimation, and is also statistical consistent under the multi-species coalescent model.
Phylogenetic networks are a generalization of phylogenetic trees allowing for the representation of non-treelike evolutionary events such as hybridization. Typically, such networks have been analyzed based on their `level, i.e. based on the complexity of their 2-edge-connected components. However, recently the question of how `treelike a phylogenetic network is has become the center of attention in various studies. This led to the introduction of emph{tree-based networks}, i.e. networks that can be constructed from a phylogenetic tree, called the emph{base tree}, by adding additional edges. While the concept of tree-basedness was originally introduced for rooted phylogenetic networks, it has recently also been considered for unrooted networks. In the present study, we compare and contrast findings obtained for unrooted emph{binary} tree-based networks to unrooted emph{non-binary} networks. In particular, while it is known that up to level 4 all unrooted binary networks are tree-based, we show that in the case of non-binary networks, this result only holds up to level 3.
Tree-child networks, one of the prominent network classes in phylogenetics, have been introduced for the purpose of modeling reticulate evolution. Recently, the first author together with Gittenberger and Mansouri (2019) showed that the number ${rm TC}_{ell,k}$ of tree-child networks with $ell$ leaves and $k$ reticulation vertices has the first-order asymptotics [ {rm TC}_{ell,k}sim c_kleft(frac{2}{e}right)^{ell}ell^{ell+2k-1},qquad (ellrightarrowinfty). ] Moreover, they also computed $c_k$ for $k=1,2,$ and $3$. In this short note, we give a second approach to the above result which is based on a recent (algorithmic) approach for the counting of tree-child networks due to Cardona and Zhang (2020). This second approach is also capable of giving a simple, closed-form expression for $c_k$ for all $kgeq 0$.