No Arabic abstract
A stochastic model of autoregulated bursty gene expression by Kumar et al. [Phys. Rev. Lett. 113, 268105 (2014)] has been exactly solved in steady-state conditions under the implicit assumption that protein numbers are sufficiently large such that fluctuations in protein numbers due to reversible protein-promoter binding can be ignored. Here we derive an alternative model that takes into account these fluctuations and hence can be used to study low protein number effects. The exact steady-state protein number distributions is derived as a sum of Gaussian hypergeometric functions. We use the theory to study how promoter switching rates and the type of feedback influence the size of protein noise and noise-induced bistability. Furthermore we show that our model predictions for the protein number distribution are significantly different from those of Kumar et al. when the protein mean is small, gene switching is fast, and protein binding is faster than unbinding.
The bulk of stochastic gene expression models in the literature do not have an explicit description of the age of a cell within a generation and hence they cannot capture events such as cell division and DNA replication. Instead, many models incorporate cell cycle implicitly by assuming that dilution due to cell division can be described by an effective decay reaction with first-order kinetics. If it is further assumed that protein production occurs in bursts then the stationary protein distribution is a negative binomial. Here we seek to understand how accurate these implicit models are when compared with more detailed models of stochastic gene expression. We derive the exact stationary solution of the chemical master equation describing bursty protein dynamics, binomial partitioning at mitosis, age-dependent transcription dynamics including replication, and random interdivision times sampled from Erlang or more general distributions; the solution is different for single lineage and population snapshot settings. We show that protein distributions are well approximated by the solution of implicit models (a negative binomial) when the mean number of mRNAs produced per cycle is low and the cell cycle length variability is large. When these conditions are not met, the distributions are either almost bimodal or else display very flat regions near the mode and cannot be described by implicit models. We also show that for genes with low transcription rates, the size of protein noise has a strong dependence on the replication time, it is almost independent of cell cycle variability for lineage measurements and increases with cell cycle variability for population snapshot measurements. In contrast for large transcription rates, the size of protein noise is independent of replication time and increases with cell cycle variability for both lineage and population measurements.
In the last years, tens of thousands gene expression profiles for cells of several organisms have been monitored. Gene expression is a complex transcriptional process where mRNA molecules are translated into proteins, which control most of the cell functions. In this process, the correlation among genes is crucial to determine the specific functions of genes. Here, we propose a novel multi-dimensional stochastic approach to deal with the gene correlation phenomena. Interestingly, our stochastic framework suggests that the study of the gene correlation requires only one theoretical assumption -Markov property- and the experimental transition probability, which characterizes the gene correlation system. Finally, a gene expression experiment is proposed for future applications of the model.
Gene expression data for a set of 12 localizations from The Cancer Genome Atlas are processed in order to evaluate an entropy-like magnitude allowing the characterization of tumors and comparison with the corresponding normal tissues. The comparison indicates that the number of available states in gene expression space is much greater for tumors than for normal tissues and points out to a scaling relation between the fraction of available states and the overlapping between the tumor and normal sample clouds.
Inferring functional relationships within complex networks from static snapshots of a subset of variables is a ubiquitous problem in science. For example, a key challenge of systems biology is to translate cellular heterogeneity data obtained from single-cell sequencing or flow-cytometry experiments into regulatory dynamics. We show how static population snapshots of co-variability can be exploited to rigorously infer properties of gene expression dynamics when gene expression reporters probe their upstream dynamics on separate time-scales. This can be experimentally exploited in dual-reporter experiments with fluorescent proteins of unequal maturation times, thus turning an experimental bug into an analysis feature. We derive correlation conditions that detect the presence of closed-loop feedback regulation in gene regulatory networks. Furthermore, we show how genes with cell-cycle dependent transcription rates can be identified from the variability of co-regulated fluorescent proteins. Similar correlation constraints might prove useful in other areas of science in which static correlation snapshots are used to infer causal connections between dynamically interacting components.
A principal component analysis of the TCGA data for 15 cancer localizations unveils the following qualitative facts about tumors: 1) The state of a tissue in gene expression space may be described by a few variables. In particular, there is a single variable describing the progression from a normal tissue to a tumor. 2) Each cancer localization is characterized by a gene expression profile, in which genes have specific weights in the definition of the cancer state. There are no less than 2500 differentially-expressed genes, which lead to power-like tails in the expression distribution functions. 3) Tumors in different localizations share hundreds or even thousands of differentially expressed genes. There are 6 genes common to the 15 studied tumor localizations. 4) The tumor region is a kind of attractor. Tumors in advanced stages converge to this region independently of patient age or genetic variability. 5) There is a landscape of cancer in gene expression space with an approximate border separating normal tissues from tumors.