ترغب بنشر مسار تعليمي؟ اضغط هنا

piBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios

62   0   0.0 ( 0 )
 نشر من قبل Filip Bielejec
 تاريخ النشر 2013
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Background: Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results: We present a flexible Monte Carlo simulation tool, called piBUSS, that employs the BEAGLE high performance library for phylogenetic computations within BEAST to rapidly generate large sequence alignments under complex evolutionary models. piBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. Analogous to BEAST model and analysis setup, more advanced simulation options are supported through an extensible markup language (XML) specification, which in addition to generating sequence output, also allows users to combine simulation and analysis in a single BEAST run. Conclusions: piBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, piBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. The software aims at implementing new models and data types that are continuously being developed as part of BEAST/BEAGLE.



قيم البحث

اقرأ أيضاً

We here propose to model active and cumulative cases data from COVID-19 by a continuous effective model based on a modified diffusion equation under Lifshitz scaling with a dynamic diffusion coefficient. The proposed model is rich enough to capture d ifferent aspects of a complex virus diffusion as humanity has been recently facing. The model being continuous it is bound to be solved analytically and/or numerically. So, we investigate two possible models where the diffusion coefficient associated with possible types of contamination are captured by some specific profiles. The active cases curves here derived were able to successfully describe the pandemic behavior of Germany and Spain. Moreover, we also predict some scenarios for the evolution of COVID-19 in Brazil. Furthermore, we depicted the cumulative cases curves of COVID-19, reproducing the spreading of the pandemic between the cities of S~ao Paulo and S~ao Jose dos Campos, Brazil. The scenarios also unveil how the lockdown measures can flatten the contamination curves. We can find the best profile of the diffusion coefficient that better fit the real data of pandemic.
107 - Sanzo Miyazawa 2019
The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using $L_2$ regularization for fields, group $L_1$ for couplings is shown to be very effective for sparse couplings in comparison with $L_2$ and $L_1$. Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings and to use a soft-thresholding function for group $L_1$. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.
108 - Joseph Heled 2011
We show how to analytically derive the average sequence dissimilarity (ASD) within and between species under a simplified multi-species coalescent setup.
We examine Kreps (2019) conjecture that optimal expected utility in the classic Black--Scholes--Merton (BSM) economy is the limit of optimal expected utility for a sequence of discrete-time economies that approach the BSM economy in a natural sense: The $n$th discrete-time economy is generated by a scaled $n$-step random walk, based on an unscaled random variable $zeta$ with mean zero, variance one, and bounded support. We confirm Kreps conjecture if the consumers utility function $U$ has asymptotic elasticity strictly less than one, and we provide a counterexample to the conjecture for a utility function $U$ with asymptotic elasticity equal to 1, for $zeta$ such that $E[zeta^3] > 0.$
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogica l history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behaviour of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا