No Arabic abstract
We review a finite-sampling exponential bound due to Serfling and discuss related exponential bounds for the hypergeometric distribution. We then discuss how such bounds motivate some new results for two-sample empirical processes. Our development complements recent results by Wei and Dudley (2011) concerning exponential bounds for two-sided Kolmogorov - Smirnov statistics by giving corresponding results for one-sided statistics with emphasis on adjusted inequalities of the type proved originally by Dvoretzky, Kiefer, and Wolfowitz (1956) and by Massart (1990) for one-samp
In this paper, we introduce the fundamental notion of a Markov basis, which is one of the first connections between commutative algebra and statistics. The notion of a Markov basis is first introduced by Diaconis and Sturmfels (1998) for conditional testing problems on contingency tables by Markov chain Monte Carlo methods. In this method, we make use of a connected Markov chain over the given conditional sample space to estimate the P-values numerically for various conditional tests. A Markov basis plays an importance role in this arguments, because it guarantees the connectivity of the chain, which is needed for unbiasedness of the estimate, for arbitrary conditional sample space. As another important point, a Markov basis is characterized as generators of the well-specified toric ideals of polynomial rings. This connection between commutative algebra and statistics is the main result of Diaconis and Sturmfels (1998). After this first paper, a Markov basis is studied intensively by many researchers both in commutative algebra and statistics, which yields an attractive field called computational algebraic statistics. In this paper, we give a review of the Markov chain Monte Carlo methods for contingency tables and Markov bases, with some fundamental examples. We also give some computational examples by algebraic software Macaulay2 and statistical software R. Readers can also find theoretical details of the problems considered in this paper and various results on the structure and examples of Markov bases in Aoki, Hara and Takemura (2012).
Let $mathbf{X}_n=(x_{ij})$ be a $k times n$ data matrix with complex-valued, independent and standardized entries satisfying a Lindeberg-type moment condition. We consider simultaneously $R$ sample covariance matrices $mathbf{B}_{nr}=frac1n mathbf{Q}_r mathbf{X}_n mathbf{X}_n^*mathbf{Q}_r^top,~1le rle R$, where the $mathbf{Q}_{r}$s are nonrandom real matrices with common dimensions $ptimes k~(kgeq p)$. Assuming that both the dimension $p$ and the sample size $n$ grow to infinity, the limiting distributions of the eigenvalues of the matrices ${mathbf{B}_{nr}}$ are identified, and as the main result of the paper, we establish a joint central limit theorem for linear spectral statistics of the $R$ matrices ${mathbf{B}_{nr}}$. Next, this new CLT is applied to the problem of testing a high dimensional white noise in time series modelling. In experiments the derived test has a controlled size and is significantly faster than the classical permutation test, though it does have lower power. This application highlights the necessity of such joint CLT in the presence of several dependent sample covariance matrices. In contrast, all the existing works on CLT for linear spectral statistics of large sample covariance matrices deal with a single sample covariance matrix ($R=1$).
A sum of observations derived by a simple random sampling design from a population of independent random variables is studied. A procedure finding a general term of Edgeworth asymptotic expansion is presented. The Lindeberg condition of asymptotic normality, Berry-Esseen bound, Edgeworth asymptotic expansions under weakened conditions and Cramer type large deviation results are derived.
This paper is devoted to rejective sampling. We provide an expansion of joint inclusion probabilities of any order in terms of the inclusion probabilities of order one, extending previous results by Hajek (1964) and Hajek (1981) and making the remainder term more precise. Following Hajek (1981), the proof is based on Edgeworth expansions. The main result is applied to derive bounds on higher order correlations, which are needed for the consistency and asymptotic normality of several complex estimators.
In the statistical inference for long range dependent time series the shape of the limit distribution typically depends on unknown parameters. Therefore, we propose to use subsampling. We show the validity of subsampling for general statistics and long range dependent subordinated Gaussian processes which satisfy mild regularity conditions. We apply our method to a self-normalized change-point test statistic so that we can test for structural breaks in long range dependent time series without having to estimate any nuisance parameter. The finite sample properties are investigated in a simulation study. We analyze three data sets and compare our results to the conclusions of other authors.