Modeling genes length distribution in genomes

231 0 0.0 ( 0 )

Download Cite

Added by Miroslaw Dudek

Publication date 2006

fields Biology

and research's language is English

Authors S. Cebrat - M.R. Dudek - P. Mackiewicz

Genomics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We show, that the specific distribution of genes length, which is observed in natural genomes, might be a result of a growth process, in which a single length scale $L(t)$ develops that grows with time as $t^{1/3}$. This length scale could be associated with the length of the longest gene in an evolving genome. The growth kinetics of the genes resembles the one observed in physical systems with conserved ordered parameter. We show, that in genome this conservation is guaranteed by compositional compensation along DNA strands of the purine-like trends introduced by genes. The presented mathematical model is the modified Bak-Sneppen model of critical self-organization applied to the one-dimensional system of $N$ spins. The spins take discrete values, which represent genes length.

rate research

On the ability to reconstruct ancestral genomes from Mycobacterium genus

109 - Christophe Guyeux , Bashar Al-Nuaimi , Bassam AlKindy andn Jean-Franc{c}ois Couchot 2017

Technical signs of progress during the last decades has led to a situation in which the accumulation of genome sequence data is increasingly fast and cheap. The huge amount of molecular data available nowadays can help addressing new and essential questions in Evolution. However, reconstructing evolution of DNA sequences requires models, algorithms, statistical and computational methods of ever increasing complexity. Since most dramatic genomic changes are caused by genome rearrangements (gene duplications, gain/loss events), it becomes crucial to understand their mechanisms and reconstruct ancestors of the given genomes. This problem is known to be NP-complete even in the simplest case of three genomes. Heuristic algorithms are usually executed to provide approximations of the exact solution. We state that, even if the ancestral reconstruction problem is NP-hard in theory, its exact resolution is feasible in various situations, encompassing organelles and some bacteria. Such accurate reconstruction, which identifies too some highly homoplasic mutations whose ancestral status is undecidable, will be initiated in this work-in-progress, to reconstruct ancestral genomes of two Mycobacterium pathogenetic bacterias. By mixing automatic reconstruction of obvious situations with human interventions on signaled problematic cases, we will indicate that it should be possible to achieve a concrete, complete, and really accurate reconstruction of lineages of the Mycobacterium tuberculosis complex. Thus, it is possible to investigate how these genomes have evolved from their last common ancestors.

Genomics

Modelling survival and allele complementation in the evolution of genomes with polymorphic loci

446 - S. Cebrat , D. Stauffer , J.S. Sa Martins 2009

We have simulated the evolution of sexually reproducing populations composed of individuals represented by diploid genomes. A series of eight bits formed an allele occupying one of 128 loci of one haploid genome (chromosome). The environment required a specific activity of each locus, this being the sum of the activities of both alleles located at the corresponding loci on two chromosomes. This activity is represented by the number of bits set to zero. In a constant environment the best fitted individuals were homozygous with alleles activities corresponding to half of the environment requirement for a locus (in diploid genome two alleles at corresponding loci produced a proper activity). Changing the environment under a relatively low recombination rate promotes generation of more polymorphic alleles. In the heterozygous loci, alleles of different activities complement each other fulfilling the environment requirements. Nevertheless, the genetic pool of populations evolves in the direction of a very restricted number of complementing haplotypes and a fast changing environment kills the population. If simulations start with all loci heterozygous, they stay heterozygous for a long time.

Genomics Populations and Evolution

A genetic variant near olfactory receptor genes influences cilantro preference

368 - Nicholas Eriksson , Shirley Wu , Chuong B. Do 2012

The leaves of the Coriandrum sativum plant, known as cilantro or coriander, are widely used in many cuisines around the world. However, far from being a benign culinary herb, cilantro can be polarizing---many people love it while others claim that it tastes or smells foul, often like soap or dirt. This soapy or pungent aroma is largely attributed to several aldehydes present in cilantro. Cilantro preference is suspected to have a genetic component, yet to date nothing is known about specific mechanisms. Here we present the results of a genome-wide association study among 14,604 participants of European ancestry who reported whether cilantro tasted soapy, with replication in a distinct set of 11,851 participants who declared whether they liked cilantro. We find a single nucleotide polymorphism (SNP) significantly associated with soapy-taste detection that is confirmed in the cilantro preference group. This SNP, rs72921001, (p=6.4e-9, odds ratio 0.81 per A allele) lies within a cluster of olfactory receptor genes on chromosome 11. Among these olfactory receptor genes is OR6A2, which has a high binding specificity for several of the aldehydes that give cilantro its characteristic odor. We also estimate the heritability of cilantro soapy-taste detection in our cohort, showing that the heritability tagged by common SNPs is low, about 0.087. These results confirm that there is a genetic component to cilantro taste perception and suggest that cilantro dislike may stem from genetic variants in olfactory receptors. We propose that OR6A2 may be the olfactory receptor that contributes to the detection of a soapy smell from cilantro in European populations.

Genomics

A network approach to analyzing highly recombinant malaria parasite genes

468 - Daniel B. Larremore , Aaron Clauset , Caroline O. Buckee 2013

The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-{alpha} (DBL{alpha}) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBL{alpha} classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.

Genomics Data Analysis Statistics and Probability Molecular Networks

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

121 - Jeremie S. Kim , Can Firtina , Meryem Banu Cavlak 2019

As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read mapping and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should ideally be mapped to the latest available reference genome that represents the most relevant population. Unfortunately, the increasingly large amount of available genomic data makes it prohibitively expensive to fully re-map each read set to its respective reference genome every time the reference is updated. There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i.e., remapping). However, if a read maps to a region in the old reference that does not appear with a reasonable degree of similarity in the new reference, the read cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for remapping alignments from one genome to another. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces 1) the number of reads that need to be fully mapped to the new reference by up to 99.99% and 2) the overall execution time to remap read sets between two reference genom

Genomics Computational Engineering