No Arabic abstract
In biophysics, the search for analytical solutions of stochastic models of cellular processes is often a challenging task. In recent work on models of gene expression, it was shown that a mapping based on partitioning of Poisson arrivals (PPA-mapping) can lead to exact solutions for previously unsolved problems. While the approach can be used in general when the model involves Poisson processes corresponding to creation or degradation, current applications of the method and new results derived using it have been limited to date. In this paper, we present the exact solution of a variation of the two-stage model of gene expression (with time dependent transition rates) describing the arbitrary partitioning of proteins. The methodology proposed makes full use of the the PPA-mapping by transforming the original problem into a new process describing the evolution of three biological switches. Based on a succession of transformations, the method leads to a hierarchy of reduced models. We give an integral expression of the time dependent generating function as well as explicit results for the mean, variance, and correlation function. Finally, we discuss how results for time dependent parameters can be extended to the three-stage model and used to make inferences about models with parameter fluctuations induced by hidden stochastic variables.
We analyze the gene expression data of Zebrafish under the combined framework of complex networks and random matrix theory. The nearest neighbor spacing distribution of the corresponding matrix spectra follows random matrix predictions of Gaussian orthogonal statistics. Based on the eigenvector analysis we can divide the spectra into two parts, first part for which the eigenvector localization properties match with the random matrix theory predictions, and the second part for which they show deviation from the theory and hence are useful to understand the system dependent properties. Spectra with the localized eigenvectors can be characterized into three groups based on the eigenvalues. We explore the position of localized nodes from these different categories. Using an overlap measure, we find that the top contributing nodes in the different groups carry distinguished structural features. Furthermore, the top contributing nodes of the different localized eigenvectors corresponding to the lower eigenvalue regime form different densely connected structure well separated from each other. Preliminary biological interpretation of the genes, associated with the top contributing nodes in the localized eigenvectors, suggests that the genes corresponding to same vector share common features.
Current models for the folding of the human genome see a hierarchy stretching down from chromosome territories, through A/B compartments and TADs (topologically-associating domains), to contact domains stabilized by cohesin and CTCF. However, molecular mechanisms underlying this folding, and the way folding affects transcriptional activity, remain obscure. Here we review physical principles driving proteins bound to long polymers into clusters surrounded by loops, and present a parsimonious yet comprehensive model for the way the organization determines function. We argue that clusters of active RNA polymerases and their transcription factors are major architectural features; then, contact domains, TADs, and compartments just reflect one or more loops and clusters. We suggest tethering a gene close to a cluster containing appropriate factors -- a transcription factory -- increases the firing frequency, and offer solutions to many current puzzles concerning the actions of enhancers, super-enhancers, boundaries, and eQTLs (expression quantitative trait loci). As a result, the activity of any gene is directly influenced by the activity of other transcription units around it in 3D space, and this is supported by Brownian-dynamics simulations of transcription factors binding to cognate sites on long polymers.
The effects of carrying capacity of environment $K$ for degradation (the $K$ effect for short) on the constitutive gene expression and a simple genetic regulation system, are investigated by employing a stochastic Langevin equation combined with the corresponding Fokker-Planck equation for the two stochastic systems subjected to internal and external noises. This $K$ effect characterizes the limited degradation ability of the environment for RNA or proteins, such as insufficient catabolic enzymes. The $K$ effect could significantly change the distribution of mRNA copy-number in constitutive gene expression, and interestingly, it leads to the Fano factor slightly larger than 1 if only the internal noise exists. Therefore, that the recent experimental measurements suggests the Fano factor deviates from 1 slightly (Science {bf 346} (2014) 1533), probably originates from the $K$ effect. The $K$ effects on the steady and transient properties of genetic regulation system, have been investigated in detail. It could enhance the mean first passage time significantly especially when the noises are weak and reduce the signal-to-noise ratio in stochastic resonance substantially.
Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural networks (RNN) models boosted the interpretability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow to hierarchically distinguish different architectures of the GRN. We show that the GRNs respond differently to the addition of noise in the prediction by the RNN and we relate the noise response to the analysis of the attention mechanism. In conclusion, this work provides a a way to understand and exploit the attention mechanism of RNN and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.
In the last years, tens of thousands gene expression profiles for cells of several organisms have been monitored. Gene expression is a complex transcriptional process where mRNA molecules are translated into proteins, which control most of the cell functions. In this process, the correlation among genes is crucial to determine the specific functions of genes. Here, we propose a novel multi-dimensional stochastic approach to deal with the gene correlation phenomena. Interestingly, our stochastic framework suggests that the study of the gene correlation requires only one theoretical assumption -Markov property- and the experimental transition probability, which characterizes the gene correlation system. Finally, a gene expression experiment is proposed for future applications of the model.