No Arabic abstract
The phenotype of any organism on earth is, in large part, the consequence of interplay between numerous gene products encoded in the genome, and such interplay between gene products affects the evolutionary fate of the genome itself through the resulting phenotype. In this regard, contemporary genomes can be used as molecular records that reveal associations of various genes working in their natural lifestyles. By analyzing thousands of orthologs across ~600 bacterial species, we constructed a map of gene-gene co-occurrence across much of the sequenced biome. If genes preferentially co-occur in the same organisms, they were called herein correlogs; in the opposite case, called anti-correlogs. To quantify correlogy and anti-correlogy, we alleviated the contribution of indirect correlations between genes by adapting ideas developed for reverse engineering of transcriptional regulatory networks. Resultant correlogous associations are highly enriched for physically interacting proteins and for co-expressed transcripts, clearly differentiating a subgroup of functionally-obligatory protein interactions from conditional or transient interactions. Other biochemical and phylogenetic properties were also found to be reflected in correlogous and anti-correlogous relationships. Additionally, our study elucidates the global organization of the gene association map, in which various modules of correlogous genes are strikingly interconnected by anti-correlogous crosstalk between the modules. We then demonstrate the effectiveness of such associations along different domains of life and environmental microbial communities. These phylogenetic profiling approaches infer functional coupling of genes regardless of mechanistic details, and may be useful to guide exogenous gene import in synthetic biology.
Gene expression levels carry information about signals that have functional significance for the organism. Using the gap gene network in the fruit fly embryo as an example, we show how this information can be decoded, building a dictionary that translates expression levels into a map of implied positions. The optimal decoder makes use of graded variations in absolute expression level, resulting in positional estimates that are precise to ~1% of the embryos length. We test this optimal decoder by analyzing gap gene expression in embryos lacking some of the primary maternal inputs to the network. The resulting maps are distorted, and these distortions predict, with no free parameters, the positions of expression stripes for the pair-rule genes in the mutant embryos.
A system-level framework of complex microbe-microbe and host-microbe chemical cross-talk would help elucidate the role of our gut microbiota in health and disease. Here we report a literature-curated interspecies network of the human gut microbiota, called NJS16. This is an extensive data resource composed of ~570 microbial species and 3 human cell types metabolically interacting through >4,400 small-molecule transport and macromolecule degradation events. Based on the contents of our network, we develop a mathematical approach to elucidate representative microbial and metabolic features of the gut microbial community in a given population, such as a disease cohort. Applying this strategy to microbiome data from type 2 diabetes patients reveals a context-specific infrastructure of the gut microbial ecosystem, core microbial entities with large metabolic influence, and frequently-produced metabolic compounds that might indicate relevant community metabolic processes. Our network presents a foundation towards integrative investigations of community-scale microbial activities within the human gut.
A wide range of organisms features molecular machines, circadian clocks, which generate endogenous oscillations with ~24 h periodicity and thereby synchronize biological processes to diurnal environmental fluctuations. Recently, it has become clear that plants harbor more complex gene regulatory circuits within the core circadian clocks than other organisms, inspiring a fundamental question: are all these regulatory interactions between clock genes equally crucial for the establishment and maintenance of circadian rhythms? Our mechanistic simulation for Arabidopsis thaliana demonstrates that at least half of the total regulatory interactions must be present to express the circadian molecular profiles observed in wild-type plants. A set of those essential interactions is called herein a kernel of the circadian system. The kernel structure unbiasedly reveals four interlocked negative feedback loops contributing to circadian rhythms, and three feedback loops among them drive the autonomous oscillation itself. Strikingly, the kernel structure, as well as the whole clock circuitry, is overwhelmingly composed of inhibitory, rather than activating, interactions between genes. We found that this tendency underlies plant circadian molecular profiles which often exhibit sharply-shaped, cuspidate waveforms. Through the generation of these cuspidate profiles, inhibitory interactions may facilitate the global coordination of temporally-distant clock events that are markedly peaked at very specific times of day. Our systematic approach resulting in experimentally-testable predictions provides insights into a design principle of biological clockwork, with implications for synthetic biology.
We investigate the dynamics of the heterodimer autorepression loop (HAL), a small genetic module in which a protein A acts as an auto-repressor and binds to a second protein B to form a AB dimer. For suitable values of the rate constants the HAL produces pulses of A alternating with pulses of B. By means of analytical and numerical calculations, we show that the duration of A-pulses is extremely robust against variation of the rate constants while the duration of the B-pulses can be flexibly adjusted. The HAL is thus a minimal genetic module generating robust pulses with tunable duration an interesting property for cellular signalling.
We analyze gene co-expression network under the random matrix theory framework. The nearest neighbor spacing distribution of the adjacency matrix of this network follows Gaussian orthogonal statistics of random matrix theory (RMT). Spectral rigidity test follows random matrix prediction for a certain range, and deviates after wards. Eigenvector analysis of the network using inverse participation ratio (IPR) suggests that the statistics of bulk of the eigenvalues of network is consistent with those of the real symmetric random matrix, whereas few eigenvalues are localized. Based on these IPR calculations, we can divide eigenvalues in three sets; (A) The non-degenerate part that follows RMT. (B) The non-degenerate part, at both ends and at intermediate eigenvalues, which deviate from RMT and expected to contain information about {it important nodes} in the network. (C) The degenerate part with $zero$ eigenvalue, which fluctuates around RMT predicted value. We identify nodes corresponding to the dominant modes of the corresponding eigenvectors and analyze their structural properties.