Do you want to publish a course? Click here

A Unified Framework for Trees, Multi-Dimensional Scaling and Planar Graphs

135   0   0.0 ( 0 )
 Added by Peter Waddell
 Publication date 2010
  fields Biology
and research's language is English




Ask ChatGPT about the research

Least squares trees, multi-dimensional scaling and Neighbor Nets are all different and popular ways of visualizing multi-dimensional data. The method of flexi-Weighted Least Squares (fWLS) is a powerful method of fitting phylogenetic trees, when the exact form of errors is unknown. Here, both polynomial and exponential weights are used to model errors. The exact same models are implemented for multi-dimensional scaling to yield flexi-Weighted MDS, including as special cases methods such as the Sammon Stress function. Here we apply all these methods to population genetic data looking at the relationships of Abrahams Children encompassing Arabs and now widely dispersed populations of Jews, in relation to an African outgroup and a variety of European populations. Trees, MDS and Neighbor Nets of this data are compared within a common likelihood framework and the strengths and weaknesses of each method are explored. Because the errors in this type of data can be complex, for example, due to unexpected genetic transfer, we use a residual resampling method to assess the robustness of trees and the Neighbor Net. Despite the Neighbor Net fitting best by all criteria except BIC, its structure is ill defined following residual resampling. In contrast, fWLS trees are favored by BIC and retain considerable strong internal structure following residual resampling. This structure clearly separates various European and Middle Eastern populations, yet it is clear all of the models have errors much larger than expected by sampling variance alone.



rate research

Read More

Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In simple cases, single polymorphic loci explain a significant fraction of the phenotype variability. However, many traits of interest appear to be subject to multifactorial control by groups of genetic loci instead. Accurate detection of such multivariate associations is nontrivial and often hindered by limited power. At the same time, confounding influences such as population structure cause spurious association signals that result in false positive findings if they are not accounted for in the model. Here, we propose LMM-Lasso, a mixed model that allows for both, multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters, effectively controls for population structure and scales to genome-wide datasets. We show practical use in genome-wide association studies and linkage mapping through retrospective analyses. In data from Arabidopsis thaliana and mouse, our method is able to find a genetic cause for significantly greater fractions of phenotype variation in 91% of the phenotypes considered. At the same time, our model dissects this variability into components that result from individual SNP effects and population structure. In addition to this increase of genetic heritability, enrichment of known candidate genes suggests that the associations retrieved by LMM-Lasso are more likely to be genuine.
2-colored best match graphs (2-BMGs) form a subclass of sink-free bi-transitive graphs that appears in phylogenetic combinatorics. There, 2-BMGs describe evolutionarily most closely related genes between a pair of species. They are explained by a unique least resolved tree (LRT). Introducing the concept of support vertices we derive an $O(|V|+|E|log^2|V|)$-time algorithm to recognize 2-BMGs and to construct its LRT. The approach can be extended to also recognize binary-explainable 2-BMGs with the same complexity. An empirical comparison emphasizes the efficiency of the new algorithm.
322 - Emmanuel Tannenbaum 2007
This paper develops simplified mathematical models describing the mutation-selection balance for the asexual and sexual replication pathways in {it Saccharomyces cerevisiae}. We assume diploid genomes consisting of two chromosomes, and we assume that each chromosome is functional if and only if its base sequence is identical to some master sequence. The growth and replication of the yeast cells is modeled as a first-order process, with first-order growth rate constants that are determined by whether a given genome consists of zero, one, or two functional chromosomes. In the asexual pathway, we assume that a given diploid cell divides into two diploids. In the sexual pathway, we assume that a given diploid cell divides into two diploids, each of which then divide into two haploids. The resulting four haploids enter a haploid pool, where they grow and replicate until they meet another haploid with which to fuse. When the cost for sex is low, we find that the selective mating strategy leads to the highest mean fitness of the population, when compared to all of the other strategies. We also show that, at low to intermediate replication fidelities, sexual replication with random mating has a higher mean fitness than asexual replication, as long as the cost for sex is low. This is consistent with previous work suggesting that sexual replication is advantageous at high population densities, low replication rates, and intermediate replication fidelities. The results of this paper also suggest that {it S. cerevisiae} switches from asexual to sexual replication when stressed, because stressful growth conditions provide an opportunity for the yeast to clear out deleterious mutations from their genomes.
The sequence of amino acids in a protein is believed to determine its native state structure, which in turn is related to the functionality of the protein. In addition, information pertaining to evolutionary relationships is contained in homologous sequences. One powerful method for inferring these sequence attributes is through comparison of a query sequence with reference sequences that contain significant homology and whose structure, function, and/or evolutionary relationships are already known. In spite of decades of concerted work, there is no simple framework for deducing structure, function, and evolutionary (SF&E) relationships directly from sequence information alone, especially when the pair-wise identity is less than a threshold figure ~25% [1,2]. However, recent research has shown that sequence identity as low as 8% is sufficient to yield common structure/function relationships and sequence identities as large as 88% may yet result in distinct structure and function [3,4]. Starting with a basic premise that protein sequence encodes information about SF&E, one might ask how one could tease out these measures in an unbiased manner. Here we present a unified framework for inferring SF&E from sequence information using a knowledge-based approach which generates phylogenetic profiles in an unbiased manner. We illustrate the power of phylogenetic profiles generated using the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST) to derive structural domains, functional annotation, and evolutionary relationships for a host of ion-channels and human proteins of unknown function. These data are in excellent accord with published data and new experiments. Our results suggest that there is a wealth of previously unexplored information in protein sequence.
This paper develops a formulation of the quasispecies equations appropriate for polysomic, semiconservatively replicating genomes. This paper is an extension of previous work on the subject, which considered the case of haploid genomes. Here, we develop a more general formulation of the quasispecies equations that is applicable to diploid and even polyploid genomes. Interestingly, with an appropriate classification of population fractions, we obtain a system of equations that is formally identical to the haploid case. As with the work for haploid genomes, we consider both random and immortal DNA strand chromosome segregation mechanisms. However, in contrast to the haploid case, we have found that an analytical solution for the mean fitness is considerably more difficult to obtain for the polyploid case. Accordingly, whereas for the haploid case we obtained expressions for the mean fitness for the case of an analogue of the single-fitness-peak landscape for arbitrary lesion repair probabilities (thereby allowing for non-complementary genomes), here we solve for the mean fitness for the restricted case of perfect lesion repair.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا