The most parsimonious tree for random data

283 0 0.0 ( 0 )

Download Cite

Added by Mike Steel Prof.

Publication date 2014

fields Biology

and research's language is English

Authors Mareike Fischer - Michelle Galla - Lina Herbst

Populations and Evolution

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree `shapes. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of $k$ such characters, as we show. For $k=2$, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters grows. However, again there is a twist: MP trees on six taxa are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts, and this difference does not appear to dissipate as $k$ grows.

rate research

Variance on the Leaves of a Tree Markov Random Field: Detecting Character Dependencies in Phylogenies

143 - Deeparnab Chakrabarty , Sampath Kannan , Kevin Tian 2011

Stochastic models of evolution (Markov random fields on trivalent trees) generally assume that different characters (different runs of the stochastic process) are independent and identically distributed. In this paper we take the first steps towards addressing dependent characters. Specifically we show that, under certain technical assumptions regarding the evolution of individual characters, we can detect any significant, history independent, correlation between any pair of multistate characters. For the special case of the Cavender-Farris-Neyman (CFN) model on two states with symmetric transition matrices, our analysis needs milder assumptions. To perform the analysis, we need to prove a new concentration result for multistate random variables of a Markov random field on arbitrary trivalent trees: we show that the random variable counting the number of leaves in any particular subset of states has variance that is subquadratic in the number of leaves.

Populations and Evolution Discrete Mathematics

Measuring Impact of Climate Change on Tree Species: analysis of JSDM on FIA data

102 - Hyun Choi , Ali Sadeghian , Sergio Marconi 2019

One of the first beings affected by changes in the climate are trees, one of our most vital resources. In this study tree species interaction and the response to climate in different ecological environments is observed by applying a joint species distribution model to different ecological domains in the United States. Joint species distribution models are useful to learn inter-species relationships and species response to the environment. The climates impact on the tree species is measured through species abundance in an area. We compare the models performance across all ecological domains and study the sensitivity of the climate variables. With the prediction of abundances, tree species populations can be predicted in the future and measure the impact of climate change on tree populations.

Populations and Evolution

Computing the Distribution of a Tree Metric

120 - David Bryant , Mike Steel 2008

The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in `cherries of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction.

Populations and Evolution Quantitative Methods

The space of tree-based phylogenetic networks

173 - Mareike Fischer , Andrew Francis 2019

Phylogenetic networks are generalizations of phylogenetic trees that allow the representation of reticulation events such as horizontal gene transfer or hybridization, and can also represent uncertainty in inference. A subclass of these, tree-based phylogenetic networks, have been introduced to capture the extent to which reticulate evolution nevertheless broadly follows tree-like patterns. Several important operations that change a general phylogenetic network have been developed in recent years, and are important for allowing algorithms to move around spaces of networks; a vital ingredient in finding an optimal network given some biological data. A key such operation is the Nearest Neighbor Interchange, or NNI. While it is already known that the space of unrooted phylogenetic networks is connected under NNI, it has been unclear whether this also holds for the subspace of tree-based networks. In this paper we show that the space of unrooted tree-based phylogenetic networks is indeed connected under the NNI operation. We do so by explicitly showing how to get from one such network to another one without losing tree-basedness along the way. Moreover, we introduce some new concepts, for instance ``shoat networks, and derive some interesting aspects concerning tree-basedness. Last, we use our results to derive an upper bound on the size of the space of tree-based networks.

Populations and Evolution Combinatorics

Context tree selection for functional data

166 - A. Duarte , R. Fraiman , A. Galves 2016

It has been repeatedly conjectured that the brain retrieves statistical regularities from stimuli. Here we present a new statistical approach allowing to address this conjecture. This approach is based on a new class of stochastic processes driven by chains with memory of variable length. It leads to a new experimental protocol in which sequences of auditory stimuli generated by a stochastic chain are presented to volunteers while electroencephalographic (EEG) data is recorded from their scalp. A new statistical model selection procedure for functional data is introduced and proved to be consistent. Applied to samples of EEG data collected using our experimental protocol it produces results supporting the conjecture that the brain effectively identifies the structure of the chain generating the sequence of stimuli.

Neurons and Cognition