ﻻ يوجد ملخص باللغة العربية
Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show some notable general features including essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter. Assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be controlled by a variety of (unspecified) probability distribution functions, we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and has a specific logarithmic form. Using the data for 1000+ genomes available to us in early 2010, we find excellent fits to the data over several orders of magnitude, in the linear regime for the Prokaryote data, and the full non-linear form for the Eukaryote data. In their region of overlap the salient features are statistically congruent, which allows us to: interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, estimate some minimal genome sizes, and predict a maximal Prokaryote genome size on the order of 8-12 megabasepairs. These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission.
We report a droplet microfluidic method to target and sort individual cells directly from complex microbiome samples, and to prepare these cells for bulk whole genome sequencing without cultivation. We characterize this approach by recovering bacteri
Engineering the entire genome of an organism enables large-scale changes in organization, function, and external interactions, with significant implications for industry, medicine, and the environment. Improvements to DNA synthesis and organism engin
Being able to store and transmit human genome sequences is an important part in genomic research and industrial applications. The complete human genome has 3.1 billion base pairs (haploid), and storing the entire genome naively takes about 3 GB, whic
Recent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and
The problem of the directionality of genome evolution is studied from the information-theoretic view. We propose that the function-coding information quantity of a genome always grows in the course of evolution through sequence duplication, expansion