No Arabic abstract
We consider Markov chain Monte Carlo methods for calculating conditional p values of statistical models for count data arising in Box-Behnken designs. The statistical model we consider is a discrete version of the first-order model in the response surface methodology. For our models, the Markov basis, a key notion to construct a connected Markov chain on a given sample space, is characterized as generators of the toric ideals for the centrally symmetric configurations of root system D_n. We show the structure of the Groebner bases for these cases. A numerical example for an imaginary data set is given.
It is known that a Markov basis of the binary graph model of a graph $G$ corresponds to a set of binomial generators of cut ideals $I_{widehat{G}}$ of the suspension $widehat{G}$ of $G$. In this paper, we give another application of cut ideals to statistics. We show that a set of binomial generators of cut ideals is a Markov basis of some regular two-level fractional factorial design. As application, we give a Markov basis of degree 2 for designs defined by at most two relations.
Markov chain Monte Carlo (MCMC) produces a correlated sample for estimating expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering this question lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem (CLT). The multivariate nature of this Monte Carlo error largely has been ignored in the MCMC literature. We present a multivariate framework for terminating simulation in MCMC. We define a multivariate effective sample size, estimating which requires strongly consistent estimators of the covariance matrix in the Markov chain CLT; a property we show for the multivariate batch means estimator. We then provide a lower bound on the number of minimum effective samples required for a desired level of precision. This lower bound depends on the problem only in the dimension of the expectation being estimated, and not on the underlying stochastic process. This result is obtained by drawing a connection between terminating simulation via effective sample size and terminating simulation using a relative standard deviation fixed-volume sequential stopping rule; which we demonstrate is an asymptotically valid procedure. The finite sample properties of the proposed method are demonstrated in a variety of examples.
This paper proposes a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods.
Markov chain models are used in various fields, such behavioral sciences or econometrics. Although the goodness of fit of the model is usually assessed by large sample approximation, it is desirable to use conditional tests if the sample size is not large. We study Markov bases for performing conditional tests of the toric homogeneous Markov chain model, which is the envelope exponential family for the usual homogeneous Markov chain model. We give a complete description of a Markov basis for the following cases: i) two-state, arbitrary length, ii) arbitrary finite state space and length of three. The general case remains to be a conjecture. We also present a numerical example of conditional tests based on our Markov basis.
Calculating a Monte Carlo standard error (MCSE) is an important step in the statistical analysis of the simulation output obtained from a Markov chain Monte Carlo experiment. An MCSE is usually based on an estimate of the variance of the asymptotic normal distribution. We consider spectral and batch means methods for estimating this variance. In particular, we establish conditions which guarantee that these estimators are strongly consistent as the simulation effort increases. In addition, for the batch means and overlapping batch means methods we establish conditions ensuring consistency in the mean-square sense which in turn allows us to calculate the optimal batch size up to a constant of proportionality. Finally, we examine the empirical finite-sample properties of spectral variance and batch means estimators and provide recommendations for practitioners.