No Arabic abstract
Technical signs of progress during the last decades has led to a situation in which the accumulation of genome sequence data is increasingly fast and cheap. The huge amount of molecular data available nowadays can help addressing new and essential questions in Evolution. However, reconstructing evolution of DNA sequences requires models, algorithms, statistical and computational methods of ever increasing complexity. Since most dramatic genomic changes are caused by genome rearrangements (gene duplications, gain/loss events), it becomes crucial to understand their mechanisms and reconstruct ancestors of the given genomes. This problem is known to be NP-complete even in the simplest case of three genomes. Heuristic algorithms are usually executed to provide approximations of the exact solution. We state that, even if the ancestral reconstruction problem is NP-hard in theory, its exact resolution is feasible in various situations, encompassing organelles and some bacteria. Such accurate reconstruction, which identifies too some highly homoplasic mutations whose ancestral status is undecidable, will be initiated in this work-in-progress, to reconstruct ancestral genomes of two Mycobacterium pathogenetic bacterias. By mixing automatic reconstruction of obvious situations with human interventions on signaled problematic cases, we will indicate that it should be possible to achieve a concrete, complete, and really accurate reconstruction of lineages of the Mycobacterium tuberculosis complex. Thus, it is possible to investigate how these genomes have evolved from their last common ancestors.
We sequenced genomes from a $sim$7,000 year old early farmer from Stuttgart in Germany, an $sim$8,000 year old hunter-gatherer from Luxembourg, and seven $sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations deep relationships and show that EEF had $sim$44% ancestry from a Basal Eurasian lineage that split prior to the diversification of all other non-African lineages.
We show, that the specific distribution of genes length, which is observed in natural genomes, might be a result of a growth process, in which a single length scale $L(t)$ develops that grows with time as $t^{1/3}$. This length scale could be associated with the length of the longest gene in an evolving genome. The growth kinetics of the genes resembles the one observed in physical systems with conserved ordered parameter. We show, that in genome this conservation is guaranteed by compositional compensation along DNA strands of the purine-like trends introduced by genes. The presented mathematical model is the modified Bak-Sneppen model of critical self-organization applied to the one-dimensional system of $N$ spins. The spins take discrete values, which represent genes length.
Microbes are essentially yet convolutedly linked with human lives on the earth. They critically interfere in different physiological processes and thus influence overall health status. Studying microbial species is used to be constrained to those that can be cultured in the lab. But it excluded a huge portion of the microbiome that could not survive on lab conditions. In the past few years, the culture-independent metagenomic sequencing enabled us to explore the complex microbial community coexisting within and on us. Metagenomics has equipped us with new avenues of investigating the microbiome, from studying a single species to a complex community in a dynamic ecosystem. Thus, identifying the involved microbes and their genomes becomes one of the core tasks in metagenomic sequencing. Metagenome-assembled genomes are groups of contigs with similar sequence characteristics from de novo assembly and could represent the microbial genomes from metagenomic sequencing. In this paper, we reviewed a spectrum of tools for producing and annotating metagenome-assembled genomes from metagenomic sequencing data and discussed their technical and biological perspectives.
We have simulated the evolution of sexually reproducing populations composed of individuals represented by diploid genomes. A series of eight bits formed an allele occupying one of 128 loci of one haploid genome (chromosome). The environment required a specific activity of each locus, this being the sum of the activities of both alleles located at the corresponding loci on two chromosomes. This activity is represented by the number of bits set to zero. In a constant environment the best fitted individuals were homozygous with alleles activities corresponding to half of the environment requirement for a locus (in diploid genome two alleles at corresponding loci produced a proper activity). Changing the environment under a relatively low recombination rate promotes generation of more polymorphic alleles. In the heterozygous loci, alleles of different activities complement each other fulfilling the environment requirements. Nevertheless, the genetic pool of populations evolves in the direction of a very restricted number of complementing haplotypes and a fast changing environment kills the population. If simulations start with all loci heterozygous, they stay heterozygous for a long time.
As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read mapping and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should ideally be mapped to the latest available reference genome that represents the most relevant population. Unfortunately, the increasingly large amount of available genomic data makes it prohibitively expensive to fully re-map each read set to its respective reference genome every time the reference is updated. There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i.e., remapping). However, if a read maps to a region in the old reference that does not appear with a reasonable degree of similarity in the new reference, the read cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for remapping alignments from one genome to another. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces 1) the number of reads that need to be fully mapped to the new reference by up to 99.99% and 2) the overall execution time to remap read sets between two reference genom