No Arabic abstract
We analyze the gene expression data of Zebrafish under the combined framework of complex networks and random matrix theory. The nearest neighbor spacing distribution of the corresponding matrix spectra follows random matrix predictions of Gaussian orthogonal statistics. Based on the eigenvector analysis we can divide the spectra into two parts, first part for which the eigenvector localization properties match with the random matrix theory predictions, and the second part for which they show deviation from the theory and hence are useful to understand the system dependent properties. Spectra with the localized eigenvectors can be characterized into three groups based on the eigenvalues. We explore the position of localized nodes from these different categories. Using an overlap measure, we find that the top contributing nodes in the different groups carry distinguished structural features. Furthermore, the top contributing nodes of the different localized eigenvectors corresponding to the lower eigenvalue regime form different densely connected structure well separated from each other. Preliminary biological interpretation of the genes, associated with the top contributing nodes in the localized eigenvectors, suggests that the genes corresponding to same vector share common features.
The topological analysis of biological networks has been a prolific topic in network science during the last decade. A persistent problem with this approach is the inherent uncertainty and noisy nature of the data. One of the cases in which this situation is more marked is that of transcriptional regulatory networks (TRNs) in bacteria. The datasets are incomplete because regulatory pathways associated to a relevant fraction of bacterial genes remain unknown. Furthermore, direction, strengths and signs of the links are sometimes unknown or simply overlooked. Finally, the experimental approaches to infer the regulations are highly heterogeneous, in a way that induces the appearance of systematic experimental-topological correlations. And yet, the quality of the available data increases constantly. In this work we capitalize on these advances to point out the influence of data (in)completeness and quality on some classical results on topological analysis of TRNs, specially regarding modularity at different levels. In doing so, we identify the most relevant factors affecting the validity of previous findings, highlighting important caveats to future prokaryotic TRNs topological analysis.
We analyze gene co-expression network under the random matrix theory framework. The nearest neighbor spacing distribution of the adjacency matrix of this network follows Gaussian orthogonal statistics of random matrix theory (RMT). Spectral rigidity test follows random matrix prediction for a certain range, and deviates after wards. Eigenvector analysis of the network using inverse participation ratio (IPR) suggests that the statistics of bulk of the eigenvalues of network is consistent with those of the real symmetric random matrix, whereas few eigenvalues are localized. Based on these IPR calculations, we can divide eigenvalues in three sets; (A) The non-degenerate part that follows RMT. (B) The non-degenerate part, at both ends and at intermediate eigenvalues, which deviate from RMT and expected to contain information about {it important nodes} in the network. (C) The degenerate part with $zero$ eigenvalue, which fluctuates around RMT predicted value. We identify nodes corresponding to the dominant modes of the corresponding eigenvectors and analyze their structural properties.
Homeostasis of protein concentrations in cells is crucial for their proper functioning, and this requires concentrations (at their steady-state levels) to be stable to fluctuations. Since gene expression is regulated by proteins such as transcription factors (TFs), the full set of proteins within the cell constitutes a large system of interacting components. Here, we explore factors affecting the stability of this system by coupling the dynamics of mRNAs and protein concentrations in a growing cell. We find that it is possible for protein concentrations to become unstable if the regulation strengths or system size becomes too large, and that other global structural features of the networks can dramatically enhance the stability of the system. In particular, given the same number of proteins, TFs, number of interactions, and regulation strengths, a network that resembles a bipartite graph with a lower fraction of interactions that target TFs has a higher chance of being stable. By scrambling the $textit{E. coli.}$ transcription network, we find that the randomized network with the same number of regulatory interactions is much more likely to be unstable than the real network. These findings suggest that constraints imposed by system stability could have played a role in shaping the existing regulatory network during the evolutionary process. We also find that contrary to what one might expect from random matrix theory and what has been argued in the literature, the degradation rate of mRNA does not affect whether the system is stable.
Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural networks (RNN) models boosted the interpretability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow to hierarchically distinguish different architectures of the GRN. We show that the GRNs respond differently to the addition of noise in the prediction by the RNN and we relate the noise response to the analysis of the attention mechanism. In conclusion, this work provides a a way to understand and exploit the attention mechanism of RNN and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.
The effects of carrying capacity of environment $K$ for degradation (the $K$ effect for short) on the constitutive gene expression and a simple genetic regulation system, are investigated by employing a stochastic Langevin equation combined with the corresponding Fokker-Planck equation for the two stochastic systems subjected to internal and external noises. This $K$ effect characterizes the limited degradation ability of the environment for RNA or proteins, such as insufficient catabolic enzymes. The $K$ effect could significantly change the distribution of mRNA copy-number in constitutive gene expression, and interestingly, it leads to the Fano factor slightly larger than 1 if only the internal noise exists. Therefore, that the recent experimental measurements suggests the Fano factor deviates from 1 slightly (Science {bf 346} (2014) 1533), probably originates from the $K$ effect. The $K$ effects on the steady and transient properties of genetic regulation system, have been investigated in detail. It could enhance the mean first passage time significantly especially when the noises are weak and reduce the signal-to-noise ratio in stochastic resonance substantially.