ترغب بنشر مسار تعليمي؟ اضغط هنا

Reconciling Multiple Genes Trees via Segmental Duplications and Losses

357   0   0.0 ( 0 )
 نشر من قبل Celine Scornavacca
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost $delta$ and $lambda$, respectively. We show that the problem is polynomial-time solvable when $delta leq lambda$ (via LCA-mapping), while if $delta > lambda$ the problem is NP-hard, even when $lambda = 0$ and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are $delta/lambda$ and the number $d$ of segmental duplications, of time complexity $O(lceil frac{delta}{lambda} rceil^{d} cdot n cdot frac{delta}{lambda})$. Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.



قيم البحث

اقرأ أيضاً

Segmental duplications (SDs), or low-copy repeats (LCR), are segments of DNA greater than 1 Kbp with high sequence identity that are copied to other regions of the genome. SDs are among the most important sources of evolution, a common cause of genom ic structural variation, and several are associated with diseases of genomic origin. Despite their functional importance, SDs present one of the major hurdles for de novo genome assembly due to the ambiguity they cause in building and traversing both state-of-the-art overlap-layout-consensus and de Bruijn graphs. This causes SD regions to be misassembled, collapsed into a unique representation, or completely missing from assembled reference genomes for various organisms. In turn, this missing or incorrect information limits our ability to fully understand the evolution and the architecture of the genomes. Despite the essential need to accurately characterize SDs in assemblies, there is only one tool that has been developed for this purpose, called Whole Genome Assembly Comparison (WGAC). WGAC is comprised of several steps that employ different tools and custom scripts, which makes it difficult and time consuming to use. Thus there is still a need for algorithms to characterize within-assembly SDs quickly, accurately, and in a user friendly manner. Here we introduce a SEgmental Duplication Evaluation Framework (SEDEF) to rapidly detect SDs through sophisticated filtering strategies based on Jaccard similarity and local chaining. We show that SEDEF accurately detects SDs while maintaining substantial speed up over WGAC that translates into practical run times of minutes instead of weeks. Notably, our algorithm captures up to 25% pairwise error between segments, where previous studies focused on only 10%, allowing us to more deeply track the evolutionary history of the genome. SEDEF is available at https://github.com/vpc-ccg/sedef
Empirical observations show that ecological communities can have a huge number of coexisting species, also with few or limited number of resources. These ecosystems are characterized by multiple type of interactions, in particular displaying cooperat ive behaviors. However, standard modeling of population dynamics based on Lotka-Volterra type of equations predicts that ecosystem stability should decrease as the number of species in the community increases and that cooperative systems are less stable than communities with only competitive and/or exploitative interactions. Here we propose a stochastic model of population dynamics, which includes exploitative interactions as well as cooperative interactions induced by cross-feeding. The model is exactly solved and we obtain results for relevant macro-ecological patterns, such as species abundance distributions and correlation functions. In the large system size limit, any number of species can coexist for a very general class of interaction networks and stability increases as the number of species grows. For pure mutualistic/commensalistic interactions we determine the topological properties of the network that guarantee species coexistence. We also show that the stationary state is globally stable and that inferring species interactions through species abundance correlation analysis may be misleading. Our theoretical approach thus show that appropriate models of cooperation naturally leads to a solution of the long-standing question about complexity-stability paradox and on how highly biodiverse communities can coexist.
Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive comput ation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa $n$. For fixed $n$, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with $k_0^n$, where $k_0 approx 1.5028$ is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with $sqrt{3/2}(4/3)^n$ and the variance with approximately $1.4048(1.8215)^n$. The results provide a contribution to the combinatorial study of gene trees and species trees.
More than 300,000 new cases worldwide are being diagnosed with oral cancer annually. Complexity of oral cancer renders designing drug targets very difficult. We analyse protein-protein interaction network for the normal and oral cancer tissue and det ect crucial changes in the structural properties of the networks in terms of the interactions of the hub proteins and the degree-degree correlations. Further analysis of the spectra of both the networks, while exhibiting universal statistical behavior, manifest distinction in terms of the zero degeneracy, providing insight to the complexity of the underlying system.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا