ترغب بنشر مسار تعليمي؟ اضغط هنا

Computing the Distribution of a Tree Metric

75   0   0.0 ( 0 )
 نشر من قبل Mike Steel Prof.
 تاريخ النشر 2008
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in `cherries of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction.

قيم البحث

اقرأ أيضاً

Understanding the patterns and processes of diversification of life in the planet is a key challenge of science. The Tree of Life represents such diversification processes through the evolutionary relationships among the different taxa, and can be ex tended down to intra-specific relationships. Here we examine the topological properties of a large set of interspecific and intraspecific phylogenies and show that the branching patterns follow allometric rules conserved across the different levels in the Tree of Life, all significantly departing from those expected from the standard null models. The finding of non-random universal patterns of phylogenetic differentiation suggests that similar evolutionary forces drive diversification across the broad range of scales, from macro-evolutionary to micro-evolutionary processes, shaping the diversity of life on the planet.
The metrization of the space of neural responses is an ongoing research program seeking to find natural ways to describe, in geometrical terms, the sets of possible activities in the brain. One component of this program are the {em spike metrics}, no tions of distance between two spike trains recorded from a neuron. Alignment spike metrics work by identifying equivalent spikes in one train and the other. We present an alignment spike metric having $mathcal{L}_p$ underlying geometrical structure; the $mathcal{L}_2$ version is Euclidean and is suitable for further embedding in Euclidean spaces by Multidimensional Scaling methods or related procedures. We show how to implement a fast algorithm for the computation of this metric based on bipartite graph matching theory.
One of the key indicators used in tracking the evolution of an infectious disease isthe reproduction number. This quantity is usually computed using the reportednumber of cases, but ignoring that many more individuals may be infected (e.g.asymptomati cs). We propose a statistical procedure to quantify the impact of un-detected infectious cases on the determination of the effective reproduction number. Our approach is stochastic, data-driven and not relying on any compartmentalmodel. It is applied to the COVID-19 case in eight different countries and all Italianregions, showing that the effect of undetected cases leads to estimates of the effective reproduction numbers larger than those obtained only with the reported cases by factors ranging from two to ten. Our findings urge caution about deciding when and how to relax containment measures based on the value of the reproduction number.
Models of codon evolution are commonly used to identify positive selection. Positive selection is typically a heterogeneous process, i.e., it acts on some branches of the evolutionary tree and not others. Previous work on DNA models showed that when evolution occurs under a heterogeneous process it is important to consider the property of model closure, because non-closed models can give biased estimates of evolutionary processes. The existing codon models that account for the genetic code are not closed; to establish this it is enough to show that they are not linear (meaning that the sum of two codon rate matrices in the model is not a matrix in the model). This raises the concern that a single codon model fit to a heterogeneous process might mis-estimate both the effect of selection and branch lengths. Codon models are typically constructed by choosing an underlying DNA model (e.g., HKY) that acts identically and independently at each codon position, and then applying the genetic code via the parameter $omega$ to modify the rate of transitions between codons that code for different amino acids. Here we use simulation to investigate the accuracy of estimation of both the selection parameter $omega$ and branch lengths in cases where the underlying DNA process is heterogeneous but $omega$ is constant. We find that both $omega$ and branch lengths can be mis-estimated in these scenarios. Errors in $omega$ were usually less than 2% but could be as high as 17%. We also assessed if choosing different underlying DNA models had any affect on accuracy, in particular we assessed if using closed DNA models gave any advantage. However, a DNA model being closed does not imply that the codon model constructed from it is closed, and in general we found that using closed DNA models did not decrease errors in the estimation of $omega$.
The appearance of a novel coronavirus named Middle East (ME) Respiratory Syndrome Coronavirus (MERS-CoV) has raised global public health concerns regarding the current situation and its future evolution. Here we propose an integrative maximum likelih ood analysis of both cluster data in the ME region and importations in Europe to assess transmission scenario and incidence of sporadic infections. Our approach is based on a spatial-transmission model integrating mobility data worldwide and allows for variations in the zoonotic/environmental transmission and underascertainment. Maximum likelihood estimates for the ME region indicate the occurrence of a subcritical epidemic (R=0.50, 95% confidence interval (CI) 0.30-0.77) associated with a 0.28 (95% CI 0.12-0.85) daily rate of sporadic introductions. Infections in the region appear to be mainly dominated by zoonotic/environmental transmissions, with possible underascertainment (95% CI of estimated to observed sporadic cases in the range 1.03-7.32). No time evolution of the situation emerges. Analyses of flight passenger data from the region indicate areas at high risk of importation. While dismissing an immediate threat for global health security, this analysis provides a baseline scenario for future reference and updates, suggests reinforced surveillance to limit underascertainment, and calls for increased alertness in high-risk areas worldwide.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا