No Arabic abstract
One of the main aims in phylogenetics is the estimation of ancestral sequences based on present-day data like, for instance, DNA alignments. One way to estimate the data of the last common ancestor of a given set of species is to first reconstruct a phylogenetic tree with some tree inference method and then to use some method of ancestral state inference based on that tree. One of the best-known methods both for tree inference as well as for ancestral sequence inference is Maximum Parsimony (MP). In this manuscript, we focus on this method and on ancestral state inference for fully bifurcating trees. In particular, we investigate a conjecture published by Charleston and Steel in 1995 concerning the number of species which need to have a particular state, say $a$, at a particular site in order for MP to unambiguously return $a$ as an estimate for the state of the last common ancestor. We prove the conjecture for all even numbers of character states, which is the most relevant case in biology. We also show that the conjecture does not hold in general for odd numbers of character states, but also present some positive results for this case.
We examine a mathematical question concerning the reconstruction accuracy of the Fitch algorithm for reconstructing the ancestral sequence of the most recent common ancestor given a phylogenetic tree and sequence data for all taxa under consideration. In particular, for the symmetric 4-state substitution model which is also known as Jukes-Cantor model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that for any ultrametric phylogenetic tree and a symmetric model, the Fitch parsimony method using all terminal taxa is more accurate, or at least as accurate, for ancestral state reconstruction than using any particular terminal taxon or any particular pair of taxa. This conjecture had so far only been answered for two-state data by Fischer and Thatte. Here, we focus on answering the biologically more relevant case with four states, which corresponds to ancestral sequence reconstruction from DNA or RNA data.
In phylogenetic studies, biologists often wish to estimate the ancestral discrete character state at an interior vertex $v$ of an evolutionary tree $T$ from the states that are observed at the leaves of the tree. A simple and fast estimation method --- maximum parsimony --- takes the ancestral state at $v$ to be any state that minimises the number of state changes in $T$ required to explain its evolution on $T$. In this paper, we investigate the reconstruction accuracy of this estimation method further, under a simple symmetric model of state change, and obtain a number of new results, both for 2-state characters, and $r$--state characters ($r>2$). Our results rely on establishing new identities and inequalities, based on a coupling argument that involves a simpler `coin toss approach to ancestral state reconstruction.
One of the main aims of phylogenetics is to reconstruct the enquote{Tree of Life}. In this respect, different methods and criteria are used to analyze DNA sequences of different species and to compare them in order to derive the evolutionary relationships of these species. Maximum Parsimony is one such criterion for tree reconstruction and, it is the one which we will use in this paper. However, it is well-known that tree reconstruction methods can lead to wrong relationship estimates. One typical problem of Maximum Parsimony is long branch attraction, which can lead to statistical inconsistency. In this work, we will consider a blockwise approach to alignment analysis, namely so-called $k$-tuple analyses. For four taxa it has already been shown that $k$-tuple-based analyses are statistically inconsistent if and only if the standard character-based (site-based) analyses are statistically inconsistent. So, in the four-taxon case, going from individual sites to $k$-tuples does not lead to any improvement. However, real biological analyses often consider more than only four taxa. Therefore, we analyze the case of five taxa for $2$- and $3$-tuple-site data and consider alphabets with two and four elements. We show that the equivalence of single-site data and $k$-tuple-site data then no longer holds. Even so, we can show that Maximum Parsimony is statistically inconsistent for $k$-tuple site data and five taxa.
In this paper we investigate mathematical questions concerning the reliability (reconstruction accuracy) of Fitchs maximum parsimony algorithm for reconstructing the ancestral state given a phylogenetic tree and a character. In particular, we consider the question whether the maximum parsimony method applied to a subset of taxa can reconstruct the ancestral state of the root more accurately than when applied to all taxa, and we give an example showing that this indeed is possible. A surprising feature of our example is that ignoring a taxon closer to the root improves the reliability of the method. On the other hand, in the case of the two-state symmetric substitution model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that under a molecular clock the probability that the state at a single taxon is a correct guess of the ancestral state is a lower bound on the reconstruction accuracy of Fitchs method applied to all taxa.
Tuffley and Steel (1997) proved that Maximum Likelihood and Maximum Parsimony methods in phylogenetics are equivalent for sequences of characters under a simple symmetric model of substitution with no common mechanism. This result has been widely cited ever since. We show that small changes to the model assumptions suffice to make the two methods inequivalent. In particular, we analyze the case of bounded substitution probabilities as well as the molecular clock assumption. We show that in these cases, even under no common mechanism, Maximum Parsimony and Maximum Likelihood might make conflicting choices. We also show that if there is an upper bound on the substitution probabilities which is `sufficiently small, every Maximum Likelihood tree is also a Maximum Parsimony tree (but not vice versa).