This paper has been temporarily withdrawn, pending a revised version taking into account similarities between this paper and the recent work of del Barrio, Gine and Utzet (Bernoulli, 11 (1), 2005, 131-189).
We propose and study a general method for construction of consistent statistical tests on the basis of possibly indirect, corrupted, or partially available observations. The class of tests devised in the paper contains Neymans smooth tests, data-driven score tests, and some types of multi-sample tests as basic examples. Our tests are data-driven and are additionally incorporated with model selection rules. The method allows to use a wide class of model selection rules that are based on the penalization idea. In particular, many of the optimal penalties, derived in statistical literature, can be used in our tests. We establish the behavior of model selection rules and data-driven tests under both the null hypothesis and the alternative hypothesis, derive an explicit detectability rule for alternative hypotheses, and prove a master consistency theorem for the tests from the class. The paper shows that the tests are applicable to a wide range of problems, including hypothesis testing in statistical inverse problems, multi-sample problems, and nonparametric hypothesis testing.
We study the rate of convergence of the Mallows distance between the empirical distribution of a sample and the underlying population. The surprising feature of our results is that the convergence rate is slower in the discrete case than in the absolutely continuous setting. We show how the hazard function plays a significant role in these calculations. As an application, we recall that the quantity studied provides an upper bound on the distance between the bootstrap distribution of a sample mean and its true sampling distribution. Moreover, the convenient properties of the Mallows metric yield a straightforward lower bound, and therefore a relatively precise description of the asymptotic performance of the bootstrap in this problem.
Let $(Y,(X_i)_{iinmathcal{I}})$ be a zero mean Gaussian vector and $V$ be a subset of $mathcal{I}$. Suppose we are given $n$ i.i.d. replications of the vector $(Y,X)$. We propose a new test for testing that $Y$ is independent of $(X_i)_{iin mathcal{I}backslash V}$ conditionally to $(X_i)_{iin V}$ against the general alternative that it is not. This procedure does not depend on any prior information on the covariance of $X$ or the variance of $Y$ and applies in a high-dimensional setting. It straightforwardly extends to test the neighbourhood of a Gaussian graphical model. The procedure is based on a model of Gaussian regression with random Gaussian covariates. We give non asymptotic properties of the test and we prove that it is rate optimal (up to a possible $log(n)$ factor) over various classes of alternatives under some additional assumptions. Besides, it allows us to derive non asymptotic minimax rates of testing in this setting. Finally, we carry out a simulation study in order to evaluate the performance of our procedure.
The processes of the averaged regression quantiles and of their modifications provide useful tools in the regression models when the covariates are not fully under our control. As an application we mention the probabilistic risk assessment in the situation when the return depends on some exogenous variables. The processes enable to evaluate the expected $alpha$-shortfall ($0leqalphaleq 1$) and other measures of the risk, recently generally accepted in the financial literature, but also help to measure the risk in environment analysis and elsewhere.
The Ising model is one of the simplest and most famous models of interacting systems. It was originally proposed to model ferromagnetic interactions in statistical physics and is now widely used to model spatial processes in many areas such as ecology, sociology, and genetics, usually without testing its goodness of fit. Here, we propose various test statistics and an exact goodness-of-fit test for the finite-lattice Ising model. The theory of Markov bases has been developed in algebraic statistics for exact goodness-of-fit testing using a Monte Carlo approach. However, finding a Markov basis is often computationally intractable. Thus, we develop a Monte Carlo method for exact goodness-of-fit testing for the Ising model which avoids computing a Markov basis and also leads to a better connectivity of the Markov chain and hence to a faster convergence. We show how this method can be applied to analyze the spatial organization of receptors on the cell membrane.