A recent paper (Manceau and Lambert, 2016) developed a novel approach for describing two well-defined notions of species based on a phylogenetic tree and a phenotypic partition. In this paper, we explore some further combinatorial properties of this approach and describe an extension that allows an arbitrary number of phenotypic partitions to be combined with a phylogenetic tree for these two species notions.
Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive computation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa $n$. For fixed $n$, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with $k_0^n$, where $k_0 approx 1.5028$ is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with $sqrt{3/2}(4/3)^n$ and the variance with approximately $1.4048(1.8215)^n$. The results provide a contribution to the combinatorial study of gene trees and species trees.
We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with a new algorithm for numerically computing likelihoods of quantitative traits. The diffusion approach allows for analysis of datasets containing hundreds or thousands of individuals. The method, which we call snapper, has been implemented as part of the Beast2 package. We introduce the models, the efficient algorithms, and report performance of snapper on simulated data sets and on SNP data from rattlesnakes and freshwater turtles.
Rooted phylogenetic networks provide a way to describe species relationships when evolution departs from the simple model of a tree. However, networks inferred from genomic data can be highly tangled, making it difficult to discern the main reticulation signals present. In this paper, we describe a natural way to transform any rooted phylogenetic network into a simpler canonical network, which has desirable mathematical and computational properties, and is based only on the visible nodes in the original network. The method has been implemented and we demonstrate its application to some examples.
Measures of tree balance play an important role in the analysis of phylogenetic trees. One of the oldest and most popular indices in this regard is the Colless index for rooted bifurcating trees, introduced by Colless (1982). While many of its statistical properties under different probabilistic models for phylogenetic trees have already been established, little is known about its minimum value and the trees that achieve it. In this manuscript, we fill this gap in the literature. To begin with, we derive both recursive and closed expressions for the minimum Colless index of a tree with $n$ leaves. Surprisingly, these expressions show a connection between the minimum Colless index and the so-called Blancmange curve, a fractal curve. We then fully characterize the tree shapes that achieve this minimum value and we introduce both an algorithm to generate them and a recurrence to count them. After focusing on two extremal classes of trees with minimum Colless index (the maximally balanced trees and the greedy from the bottom trees), we conclude by showing that all trees with minimum Colless index also have minimum Sackin index, another popular balance index.
Phylogenetic Diversity (PD) is a prominent quantitative measure of the biodiversity of a collection of present-day species (taxa). This measure is based on the evolutionary distance among the species in the collection. Loosely speaking, if $mathcal{T}$ is a rooted phylogenetic tree whose leaf set $X$ represents a set of species and whose edges have real-valued lengths (weights), then the PD score of a subset $S$ of $X$ is the sum of the weights of the edges of the minimal subtree of $mathcal{T}$ connecting the species in $S$. In this paper, we define several natural variants of the PD score for a subset of taxa which are related by a known rooted phylogenetic network. Under these variants, we explore, for a positive integer $k$, the computational complexity of determining the maximum PD score over all subsets of taxa of size $k$ when the input is restricted to different classes of rooted phylogenetic networks