ترغب بنشر مسار تعليمي؟ اضغط هنا

Phylogenetic Profiles as a Unified Framework for Measuring Protein Structure, Function and Evolution

183   0   0.0 ( 0 )
 نشر من قبل Randen Patterson
 تاريخ النشر 2008
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The sequence of amino acids in a protein is believed to determine its native state structure, which in turn is related to the functionality of the protein. In addition, information pertaining to evolutionary relationships is contained in homologous sequences. One powerful method for inferring these sequence attributes is through comparison of a query sequence with reference sequences that contain significant homology and whose structure, function, and/or evolutionary relationships are already known. In spite of decades of concerted work, there is no simple framework for deducing structure, function, and evolutionary (SF&E) relationships directly from sequence information alone, especially when the pair-wise identity is less than a threshold figure ~25% [1,2]. However, recent research has shown that sequence identity as low as 8% is sufficient to yield common structure/function relationships and sequence identities as large as 88% may yet result in distinct structure and function [3,4]. Starting with a basic premise that protein sequence encodes information about SF&E, one might ask how one could tease out these measures in an unbiased manner. Here we present a unified framework for inferring SF&E from sequence information using a knowledge-based approach which generates phylogenetic profiles in an unbiased manner. We illustrate the power of phylogenetic profiles generated using the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST) to derive structural domains, functional annotation, and evolutionary relationships for a host of ion-channels and human proteins of unknown function. These data are in excellent accord with published data and new experiments. Our results suggest that there is a wealth of previously unexplored information in protein sequence.



قيم البحث

اقرأ أيضاً

157 - Daniel L. Rabosky 2014
A number of methods have been developed to infer differential rates of species diversification through time and among clades using time-calibrated phylogenetic trees. However, we lack a general framework that can delineate and quantify heterogeneous mixtures of dynamic processes within single phylogenies. I developed a method that can identify arbitrary numbers of time-varying diversification processes on phylogenies without specifying their locations in advance. The method uses reversible-jump Markov Chain Monte Carlo to move between model subspaces that vary in the number of distinct diversification regimes. The model assumes that changes in evolutionary regimes occur across the branches of phylogenetic trees under a compound Poisson process and explicitly accounts for rate variation through time and among lineages. Using simulated datasets, I demonstrate that the method can be used to quantify complex mixtures of time-dependent, diversity-dependent, and constant-rate diversification processes. I compared the performance of the method to the MEDUSA model of rate variation among lineages. As an empirical example, I analyzed the history of speciation and extinction during the radiation of modern whales. The method described here will greatly facilitate the exploration of macroevolutionary dynamics across large phylogenetic trees, which may have been shaped by heterogeneous mixtures of distinct evolutionary processes.
Because biological processes can make different loci have different evolutionary histories, species tree estimation requires multiple loci from across the genome. While many processes can result in discord between gene trees and species trees, incomp lete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called summary methods. Because summary methods are generally fast, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate on biologically realistic conditions. Mirarab et al. (Science 2014) presented the statistical binning technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple statistical test for combinability and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomics pipeline does not have the desirable property of being statistically consistent. We show that weighting the recalculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, weighted statistical binning enables highly accurate genome-scale species tree estimation, and is also statistical consistent under the multi-species coalescent model.
Fish farms represent a growing source of disturbance to shallow benthic ecosystems like seagrass meadows. Despite some existing insights on the mechanisms underlying decline, efficient tools to quantitatively predict the response of benthic communiti es to fish farm effluents have not yet been developed. We explored relationships of fish farm organic and nutrient input rates to the sediments with population dynamics of the key seagrass species (Posidonia oceanica) in deep meadows growing around four Mediterranean Sea bream and Sea bass fish farms. We performed 2 annual shoot censuses on permanent plots at increasing distance from cages. Before each census we measured sedimentation rates adjacent to the plots using benthic sediment traps. High shoot mortality rates were recorded near the cages, up to 20 times greater than at control sites. Recruitment rates remained similar to undisturbed meadows and could not compensate mortality, leading to rapid seagrass decline within the first 100 meters from cages. Seagrass mortality increased with total (R2= 0.47, p< 0.0002), organic matter (R2= 0.36, p= 0.001), nitrogen (R2= 0.34, p= 0.002) and phosphorus (R2= 0.58, p< 3 x 10-5) sedimentation rates. P. oceanica decline accelerated above a phosphorus loading threshold of 50 mg m-2 day-1. Benthic sedimentation rates seem a powerful predictor of seagrass mortality from fish farming, integrating local hydrodynamics, waste effluents variability and several environmental mechanisms, fuelled by organic inputs and leading to seagrass loss. Coupling direct measurements of benthic sedimentation rates with dynamics of key species is proposed as an efficient way to predict and minimize fish farm impacts to benthic communities.
Fluorescence Lifetime Imaging Microscopy (FLIM) using multiphoton excitation techniques is now finding an important place in quantitative imaging of protein-protein interactions and intracellular physiology. We review here the recent developments in multiphoton FLIM methods and also present a description of a novel multiphoton FLIM system using a streak camera that was developed in our laboratory. We provide an example of a typical application of the system in which we measure the fluorescence resonance energy transfer between a donor/acceptor pair of fluorescent proteins within a cellular specimen.
1. Joint Species Distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations, and possibly spatially structured residual covariance. They show great promise as a general anal ytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g., metabarcoding and metagenomics) community datasets. 2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte-Carlo integration of the joint JSDM likelihood and allows flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework that can make use of CPU and GPU calculations. Using simulated communities with known species-species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species-species and species-environmental associations. 3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا