No Arabic abstract
Understanding the patterns and processes of diversification of life in the planet is a key challenge of science. The Tree of Life represents such diversification processes through the evolutionary relationships among the different taxa, and can be extended down to intra-specific relationships. Here we examine the topological properties of a large set of interspecific and intraspecific phylogenies and show that the branching patterns follow allometric rules conserved across the different levels in the Tree of Life, all significantly departing from those expected from the standard null models. The finding of non-random universal patterns of phylogenetic differentiation suggests that similar evolutionary forces drive diversification across the broad range of scales, from macro-evolutionary to micro-evolutionary processes, shaping the diversity of life on the planet.
The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in `cherries of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction.
Models of codon evolution are commonly used to identify positive selection. Positive selection is typically a heterogeneous process, i.e., it acts on some branches of the evolutionary tree and not others. Previous work on DNA models showed that when evolution occurs under a heterogeneous process it is important to consider the property of model closure, because non-closed models can give biased estimates of evolutionary processes. The existing codon models that account for the genetic code are not closed; to establish this it is enough to show that they are not linear (meaning that the sum of two codon rate matrices in the model is not a matrix in the model). This raises the concern that a single codon model fit to a heterogeneous process might mis-estimate both the effect of selection and branch lengths. Codon models are typically constructed by choosing an underlying DNA model (e.g., HKY) that acts identically and independently at each codon position, and then applying the genetic code via the parameter $omega$ to modify the rate of transitions between codons that code for different amino acids. Here we use simulation to investigate the accuracy of estimation of both the selection parameter $omega$ and branch lengths in cases where the underlying DNA process is heterogeneous but $omega$ is constant. We find that both $omega$ and branch lengths can be mis-estimated in these scenarios. Errors in $omega$ were usually less than 2% but could be as high as 17%. We also assessed if choosing different underlying DNA models had any affect on accuracy, in particular we assessed if using closed DNA models gave any advantage. However, a DNA model being closed does not imply that the codon model constructed from it is closed, and in general we found that using closed DNA models did not decrease errors in the estimation of $omega$.
The appearance of a novel coronavirus named Middle East (ME) Respiratory Syndrome Coronavirus (MERS-CoV) has raised global public health concerns regarding the current situation and its future evolution. Here we propose an integrative maximum likelihood analysis of both cluster data in the ME region and importations in Europe to assess transmission scenario and incidence of sporadic infections. Our approach is based on a spatial-transmission model integrating mobility data worldwide and allows for variations in the zoonotic/environmental transmission and underascertainment. Maximum likelihood estimates for the ME region indicate the occurrence of a subcritical epidemic (R=0.50, 95% confidence interval (CI) 0.30-0.77) associated with a 0.28 (95% CI 0.12-0.85) daily rate of sporadic introductions. Infections in the region appear to be mainly dominated by zoonotic/environmental transmissions, with possible underascertainment (95% CI of estimated to observed sporadic cases in the range 1.03-7.32). No time evolution of the situation emerges. Analyses of flight passenger data from the region indicate areas at high risk of importation. While dismissing an immediate threat for global health security, this analysis provides a baseline scenario for future reference and updates, suggests reinforced surveillance to limit underascertainment, and calls for increased alertness in high-risk areas worldwide.
One of the key indicators used in tracking the evolution of an infectious disease isthe reproduction number. This quantity is usually computed using the reportednumber of cases, but ignoring that many more individuals may be infected (e.g.asymptomatics). We propose a statistical procedure to quantify the impact of un-detected infectious cases on the determination of the effective reproduction number. Our approach is stochastic, data-driven and not relying on any compartmentalmodel. It is applied to the COVID-19 case in eight different countries and all Italianregions, showing that the effect of undetected cases leads to estimates of the effective reproduction numbers larger than those obtained only with the reported cases by factors ranging from two to ten. Our findings urge caution about deciding when and how to relax containment measures based on the value of the reproduction number.
The mechanical properties of DNA play a critical role in many biological functions. For example, DNA packing in viruses involves confining the viral genome in a volume (the viral capsid) with dimensions that are comparable to the DNA persistence length. Similarly, eukaryotic DNA is packed in DNA-protein complexes (nucleosomes) in which DNA is tightly bent around protein spools. DNA is also tightly bent by many proteins that regulate transcription, resulting in a variation in gene expression that is amenable to quantitative analysis. In these cases, DNA loops are formed with lengths that are comparable to or smaller than the DNA persistence length. The aim of this review is to describe the physical forces associated with tightly bent DNA in all of these settings and to explore the biological consequences of such bending, as increasingly accessible by single-molecule techniques.