No Arabic abstract
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behaviour of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.
Researchers at the Ames Laboratory-USDOE and the Federal Bureau of Investigation (FBI) conducted a study to assess the performance of forensic examiners in firearm investigations. The study involved three different types of firearms and 173 volunteers who compared both bullets and cartridge cases. The total number of comparisons reported is 20,130, allocated to assess accuracy (8,640), repeatability (5,700), and reproducibility (5,790) of the evaluations made by participating examiners. The overall false positive error rate was estimated as 0.656% and 0.933% for bullets and cartridge cases, respectively, while the rate of false negatives was estimated as 2.87% and 1.87% for bullets and cartridge cases, respectively. Because chi-square tests of independence strongly suggest that error probabilities are not the same for each examiner, these are maximum likelihood estimates based on the beta-binomial probability model and do not depend on an assumption of equal examiner-specific error rates. Corresponding 95% confidence intervals are (0.305%,1.42%) and (0.548%,1.57%) for false positives for bullets and cartridge cases, respectively, and (1.89%,4.26%) and (1.16%,2.99%) for false negatives for bullets and cartridge cases, respectively. These results are based on data representing all controlled conditions considered, including different firearm manufacturers, sequence of manufacture, and firing separation between unknown and known comparison specimens. The results are consistent with those of prior studies, despite its more robust design and challenging specimens.
The ongoing COVID-19 pandemic highlights the essential role of mathematical models in understanding the spread of the virus along with a quantifiable and science-based prediction of the impact of various mitigation measures. Numerous types of models have been employed with various levels of success. This leads to the question of what kind of a mathematical model is most appropriate for a given situation. We consider two widely used types of models: equation-based models (such as standard compartmental epidemiological models) and agent-based models. We assess their performance by modeling the spread of COVID-19 on the Hawaiian island of Oahu under different scenarios. We show that when it comes to information crucial to decision making, both models produce very similar results. At the same time, the two types of models exhibit very different characteristics when considering their computational and conceptual complexity. Consequently, we conclude that choosing the model should be mostly guided by available computational and human resources.
Background: Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results: We present a flexible Monte Carlo simulation tool, called piBUSS, that employs the BEAGLE high performance library for phylogenetic computations within BEAST to rapidly generate large sequence alignments under complex evolutionary models. piBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. Analogous to BEAST model and analysis setup, more advanced simulation options are supported through an extensible markup language (XML) specification, which in addition to generating sequence output, also allows users to combine simulation and analysis in a single BEAST run. Conclusions: piBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, piBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. The software aims at implementing new models and data types that are continuously being developed as part of BEAST/BEAGLE.
In this paper, we carry out a computational study using the spectral decomposition of the fluctuations of a two-pathogen epidemic model around its deterministic attractor, i.e., steady state or limit cycle, to examine the role of partial vaccination and between-host pathogen interaction on early pathogen replacement during seasonal epidemics of influenza and respiratory syncytial virus.
We examine a mathematical question concerning the reconstruction accuracy of the Fitch algorithm for reconstructing the ancestral sequence of the most recent common ancestor given a phylogenetic tree and sequence data for all taxa under consideration. In particular, for the symmetric 4-state substitution model which is also known as Jukes-Cantor model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that for any ultrametric phylogenetic tree and a symmetric model, the Fitch parsimony method using all terminal taxa is more accurate, or at least as accurate, for ancestral state reconstruction than using any particular terminal taxon or any particular pair of taxa. This conjecture had so far only been answered for two-state data by Fischer and Thatte. Here, we focus on answering the biologically more relevant case with four states, which corresponds to ancestral sequence reconstruction from DNA or RNA data.