No Arabic abstract
The effects of carrying capacity of environment $K$ for degradation (the $K$ effect for short) on the constitutive gene expression and a simple genetic regulation system, are investigated by employing a stochastic Langevin equation combined with the corresponding Fokker-Planck equation for the two stochastic systems subjected to internal and external noises. This $K$ effect characterizes the limited degradation ability of the environment for RNA or proteins, such as insufficient catabolic enzymes. The $K$ effect could significantly change the distribution of mRNA copy-number in constitutive gene expression, and interestingly, it leads to the Fano factor slightly larger than 1 if only the internal noise exists. Therefore, that the recent experimental measurements suggests the Fano factor deviates from 1 slightly (Science {bf 346} (2014) 1533), probably originates from the $K$ effect. The $K$ effects on the steady and transient properties of genetic regulation system, have been investigated in detail. It could enhance the mean first passage time significantly especially when the noises are weak and reduce the signal-to-noise ratio in stochastic resonance substantially.
Current models for the folding of the human genome see a hierarchy stretching down from chromosome territories, through A/B compartments and TADs (topologically-associating domains), to contact domains stabilized by cohesin and CTCF. However, molecular mechanisms underlying this folding, and the way folding affects transcriptional activity, remain obscure. Here we review physical principles driving proteins bound to long polymers into clusters surrounded by loops, and present a parsimonious yet comprehensive model for the way the organization determines function. We argue that clusters of active RNA polymerases and their transcription factors are major architectural features; then, contact domains, TADs, and compartments just reflect one or more loops and clusters. We suggest tethering a gene close to a cluster containing appropriate factors -- a transcription factory -- increases the firing frequency, and offer solutions to many current puzzles concerning the actions of enhancers, super-enhancers, boundaries, and eQTLs (expression quantitative trait loci). As a result, the activity of any gene is directly influenced by the activity of other transcription units around it in 3D space, and this is supported by Brownian-dynamics simulations of transcription factors binding to cognate sites on long polymers.
Inferring functional relationships within complex networks from static snapshots of a subset of variables is a ubiquitous problem in science. For example, a key challenge of systems biology is to translate cellular heterogeneity data obtained from single-cell sequencing or flow-cytometry experiments into regulatory dynamics. We show how static population snapshots of co-variability can be exploited to rigorously infer properties of gene expression dynamics when gene expression reporters probe their upstream dynamics on separate time-scales. This can be experimentally exploited in dual-reporter experiments with fluorescent proteins of unequal maturation times, thus turning an experimental bug into an analysis feature. We derive correlation conditions that detect the presence of closed-loop feedback regulation in gene regulatory networks. Furthermore, we show how genes with cell-cycle dependent transcription rates can be identified from the variability of co-regulated fluorescent proteins. Similar correlation constraints might prove useful in other areas of science in which static correlation snapshots are used to infer causal connections between dynamically interacting components.
Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochemistry and molecular biology. In this paper, we show that much of the inference task can be accomplished by a deep neural network (DNN), a form of machine learning or artificial intelligence. Specifically, the DNN learns from the dynamics of the gene expression. The learnt DNN behaves like an accurate simulator of the system, on which one can perform in-silico experiments to reveal the underlying gene network. We demonstrate the method with two examples: biochemical adaptation and the gap-gene patterning in fruit fly embryogenesis. In the first example, the DNN can successfully find the two basic network motifs for adaptation - the negative feedback and the incoherent feed-forward. In the second and much more complex example, the DNN can accurately predict behaviors of essentially all the mutants. Furthermore, the regulation network it uncovers is strikingly similar to the one inferred from experiments. In doing so, we develop methods for deciphering the gene regulation network hidden in the DNN black box. Our interpretable DNN approach should have broad applications in genotype-phenotype mapping.
We analyze the gene expression data of Zebrafish under the combined framework of complex networks and random matrix theory. The nearest neighbor spacing distribution of the corresponding matrix spectra follows random matrix predictions of Gaussian orthogonal statistics. Based on the eigenvector analysis we can divide the spectra into two parts, first part for which the eigenvector localization properties match with the random matrix theory predictions, and the second part for which they show deviation from the theory and hence are useful to understand the system dependent properties. Spectra with the localized eigenvectors can be characterized into three groups based on the eigenvalues. We explore the position of localized nodes from these different categories. Using an overlap measure, we find that the top contributing nodes in the different groups carry distinguished structural features. Furthermore, the top contributing nodes of the different localized eigenvectors corresponding to the lower eigenvalue regime form different densely connected structure well separated from each other. Preliminary biological interpretation of the genes, associated with the top contributing nodes in the localized eigenvectors, suggests that the genes corresponding to same vector share common features.
Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural networks (RNN) models boosted the interpretability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow to hierarchically distinguish different architectures of the GRN. We show that the GRNs respond differently to the addition of noise in the prediction by the RNN and we relate the noise response to the analysis of the attention mechanism. In conclusion, this work provides a a way to understand and exploit the attention mechanism of RNN and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.