No Arabic abstract
In a general stochastic multistate promoter model of dynamic mRNA/protein interactions, we identify the stationary joint distribution of the promoter state, mRNA, and protein levels through an explicit `stick-breaking construction of interest in itself. This derivation is a constructive advance over previous work where the stationary distribution is solved only in restricted cases. Moreover, the stick-breaking construction allows to sample directly from the stationary distribution, permitting inference procedures and model selection. In this context, we discuss numerical Bayesian experiments to illustrate the results.
In [10], a `Markovian stick-breaking process which generalizes the Dirichlet process $(mu, theta)$ with respect to a discrete base space ${mathfrak X}$ was introduced. In particular, a sample from from the `Markovian stick-breaking processs may be represented in stick-breaking form $sum_{igeq 1} P_i delta_{T_i}$ where ${T_i}$ is a stationary, irreducible Markov chain on ${mathfrak X}$ with stationary distribution $mu$, instead of i.i.d. ${T_i}$ each distributed as $mu$ as in the Dirichlet case, and ${P_i}$ is a GEM$(theta)$ residual allocation sequence. Although the motivation in [10] was to relate these Markovian stick-breaking processes to empirical distributional limits of types of simulated annealing chains, these processes may also be thought of as a class of priors in statistical problems. The aim of this work in this context is to identify the posterior distribution and to explore the role of the Markovian structure of ${T_i}$ in some inference test cases.
For a long time, the Dirichlet process has been the gold standard discrete random measure in Bayesian nonparametrics. The Pitman--Yor process provides a simple and mathematically tractable generalization, allowing for a very flexible control of the clustering behaviour. Two commonly used representations of the Pitman--Yor process are the stick-breaking process and the Chinese restaurant process. The former is a constructive representation of the process which turns out very handy for practical implementation, while the latter describes the partition distribution induced. However, the usual proof of the connection between them is indirect and involves measure theory. We provide here an elementary proof of Pitman--Yors Chinese Restaurant process from its stick-breaking representation.
In this paper we discuss the estimation of a nonparametric component $f_1$ of a nonparametric additive model $Y=f_1(X_1) + ...+ f_q(X_q) + epsilon$. We allow the number $q$ of additive components to grow to infinity and we make sparsity assumptions about the number of nonzero additive components. We compare this estimation problem with that of estimating $f_1$ in the oracle model $Z= f_1(X_1) + epsilon$, for which the additive components $f_2,dots,f_q$ are known. We construct a two-step presmoothing-and-resmoothing estimator of $f_1$ and state finite-sample bounds for the difference between our estimator and some smoothing estimators $hat f_1^{text{(oracle)}}$ in the oracle model. In an asymptotic setting these bounds can be used to show asymptotic equivalence of our estimator and the oracle estimators; the paper thus shows that, asymptotically, under strong enough sparsity conditions, knowledge of $f_2,dots,f_q$ has no effect on estimation accuracy. Our first step is to estimate $f_1$ with an undersmoothed estimator based on near-orthogonal projections with a group Lasso bias correction. We then construct pseudo responses $hat Y$ by evaluating a debiased modification of our undersmoothed estimator of $f_1$ at the design points. In the second step the smoothing method of the oracle estimator $hat f_1^{text{(oracle)}}$ is applied to a nonparametric regression problem with responses $hat Y$ and covariates $X_1$. Our mathematical exposition centers primarily on establishing properties of the presmoothing estimator. We present simulation results demonstrating close-to-oracle performance of our estimator in practical applications.
A stochastic model of autoregulated bursty gene expression by Kumar et al. [Phys. Rev. Lett. 113, 268105 (2014)] has been exactly solved in steady-state conditions under the implicit assumption that protein numbers are sufficiently large such that fluctuations in protein numbers due to reversible protein-promoter binding can be ignored. Here we derive an alternative model that takes into account these fluctuations and hence can be used to study low protein number effects. The exact steady-state protein number distributions is derived as a sum of Gaussian hypergeometric functions. We use the theory to study how promoter switching rates and the type of feedback influence the size of protein noise and noise-induced bistability. Furthermore we show that our model predictions for the protein number distribution are significantly different from those of Kumar et al. when the protein mean is small, gene switching is fast, and protein binding is faster than unbinding.
We deal with a general class of extreme-value regression models introduced by Barreto- Souza and Vasconcellos (2011). Our goal is to derive an adjusted likelihood ratio statistic that is approximately distributed as c{hi}2 with a high degree of accuracy. Although the adjusted statistic requires more computational effort than its unadjusted counterpart, it is shown that the adjustment term has a simple compact form that can be easily implemented in standard statistical software. Further, we compare the finite sample performance of the three classical tests (likelihood ratio, Wald, and score), the gradient test that has been recently proposed by Terrell (2002), and the adjusted likelihood ratio test obtained in this paper. Our simulations favor the latter. Applications of our results are presented. Key words: Extreme-value regression; Gradient test; Gumbel distribution; Likelihood ratio test; Nonlinear models; Score test; Small-sample adjustments; Wald test.