No Arabic abstract
We derive an invertible transform linking two widely used measures of species diversity: phylogenetic diversity and the expected proportions of segregating (non-constant) sites. We assume a bi-allelic, symmetric, finite site model of substitution. Like the Hadamard transform of Hendy and Penny, the transform can be expressed completely independent of the underlying phylogeny. Our results bridge work on diversity from two quite distinct scientific communities.
Phylogenetic Diversity (PD) is a prominent quantitative measure of the biodiversity of a collection of present-day species (taxa). This measure is based on the evolutionary distance among the species in the collection. Loosely speaking, if $mathcal{T}$ is a rooted phylogenetic tree whose leaf set $X$ represents a set of species and whose edges have real-valued lengths (weights), then the PD score of a subset $S$ of $X$ is the sum of the weights of the edges of the minimal subtree of $mathcal{T}$ connecting the species in $S$. In this paper, we define several natural variants of the PD score for a subset of taxa which are related by a known rooted phylogenetic network. Under these variants, we explore, for a positive integer $k$, the computational complexity of determining the maximum PD score over all subsets of taxa of size $k$ when the input is restricted to different classes of rooted phylogenetic networks
Phylogenetic diversity indices provide a formal way to apportion evolutionary heritage across species. Two natural diversity indices are Fair Proportion (FP) and Equal Splits (ES). FP is also called evolutionary distinctiveness and, for rooted trees, is identical to the Shapley Value (SV), which arises from cooperative game theory. In this paper, we investigate the extent to which FP and ES can differ, characterise tree shapes on which the indices are identical, and study the equivalence of FP and SV and its implications in more detail. We also define and investigate analogues of these indices on unrooted trees (where SV was originally defined), including an index that is closely related to the Pauplin representation of phylogenetic diversity.
Planning for the protection of species often involves difficult choices about which species to prioritize, given constrained resources. One way of prioritizing species is to consider their evolutionary distinctiveness, i.e. their relative evolutionary isolation on a phylogenetic tree. Several evolutionary isolation metrics or phylogenetic diversity indices have been introduced in the literature, among them the so-called Fair Proportion index (also known as the evolutionary distinctiveness score). This index apportions the total diversity of a tree among all leaves, thereby providing a simple prioritization criterion for conservation. Here, we focus on the prioritization order obtained from the Fair Proportion index and analyze the effects of species extinction on this ranking. More precisely, we analyze the extent to which the ranking order may change when some species go extinct and the Fair Proportion index is re-computed for the remaining taxa. We show that for each phylogenetic tree, there are edge lengths such that the extinction of one leaf per cherry completely reverses the ranking. Moreover, we show that even if only the lowest ranked species goes extinct, the ranking order may drastically change. We end by analyzing the effects of these two extinction scenarios (extinction of the lowest ranked species and extinction of one leaf per cherry) for a collection of empirical and simulated trees. In both cases, we can observe significant changes in the prioritization orders, highlighting the empirical relevance of our theoretical findings.
In microbial ecology studies, the most commonly used ways of investigating alpha (within-sample) diversity are either to apply count-only measures such as Simpsons index to Operational Taxonomic Unit (OTU) groupings, or to use classical phylogenetic diversity (PD), which is not abundance-weighted. Although alpha diversity measures that use abundance information in a phylogenetic framework do exist, but are not widely used within the microbial ecology community. The performance of abundance-weighted phylogenetic diversity measures compared to classical discrete measures has not been explored, and the behavior of these measures under rarefaction (sub-sampling) is not yet clear. In this paper we compare the ability of various alpha diversity measures to distinguish between different community states in the human microbiome for three different data sets. We also present and compare a novel one-parameter family of alpha diversity measures, BWPD_theta, that interpolates between classical phylogenetic diversity (PD) and an abundance-weighted extension of PD. Additionally, we examine the sensitivity of these phylogenetic diversity measures to sampling, via computational experiments and by deriving a closed form solution for the expectation of phylogenetic quadratic entropy under re-sampling. In all three of the datasets considered, an abundance-weighted measure is the best differentiator between community states. OTU-based measures, on the other hand, are less effective in distinguishing community types. In addition, abundance-weighted phylogenetic diversity measures are less sensitive to differing sampling intensity than their unweighted counterparts. Based on these results we encourage the use of abundance-weighted phylogenetic diversity measures, especially for cases such as microbial ecology where species delimitation is difficult.
Rooted phylogenetic networks provide a way to describe species relationships when evolution departs from the simple model of a tree. However, networks inferred from genomic data can be highly tangled, making it difficult to discern the main reticulation signals present. In this paper, we describe a natural way to transform any rooted phylogenetic network into a simpler canonical network, which has desirable mathematical and computational properties, and is based only on the visible nodes in the original network. The method has been implemented and we demonstrate its application to some examples.