No Arabic abstract
Given independent samples from P and Q, two-sample permutation tests allow one to construct exact level tests when the null hypothesis is P=Q. On the other hand, when comparing or testing particular parameters $theta$ of P and Q, such as their means or medians, permutation tests need not be level $alpha$, or even approximately level $alpha$ in large samples. Under very weak assumptions for comparing estimators, we provide a general test procedure whereby the asymptotic validity of the permutation test holds while retaining the exact rejection probability $alpha$ in finite samples when the underlying distributions are identical. The ideas are broadly applicable and special attention is given to the k-sample problem of comparing general parameters, whereby a permutation test is constructed which is exact level $alpha$ under the hypothesis of identical distributions, but has asymptotic rejection probability $alpha$ under the more general null hypothesis of equality of parameters. A Monte Carlo simulation study is performed as well. A quite general theory is possible based on a coupling construction, as well as a key contiguity argument for the multinomial and multivariate hypergeometric distributions.
Permutation tests are widely used in statistics, providing a finite-sample guarantee on the type I error rate whenever the distribution of the samples under the null hypothesis is invariant to some rearrangement. Despite its increasing popularity and empirical success, theoretical properties of the permutation test, especially its power, have not been fully explored beyond simple cases. In this paper, we attempt to fill this gap by presenting a general non-asymptotic framework for analyzing the power of the permutation test. The utility of our proposed framework is illustrated in the context of two-sample and independence testing under both discrete and continuous settings. In each setting, we introduce permutation tests based on U-statistics and study their minimax performance. We also develop exponential concentration bounds for permuted U-statistics based on a novel coupling idea, which may be of independent interest. Building on these exponential bounds, we introduce permutation tests which are adaptive to unknown smoothness parameters without losing much power. The proposed framework is further illustrated using more sophisticated test statistics including weighted U-statistics for multinomial testing and Gaussian kernel-based statistics for density testing. Finally, we provide some simulation results that further justify the permutation approach.
Cokriging is the common method of spatial interpolation (best linear unbiased prediction) in multivariate geostatistics. While best linear prediction has been well understood in univariate spatial statistics, the literature for the multivariate case has been elusive so far. The new challenges provided by modern spatial datasets, being typically multivariate, call for a deeper study of cokriging. In particular, we deal with the problem of misspecified cokriging prediction within the framework of fixed domain asymptotics. Specifically, we provide conditions for equivalence of measures associated with multivariate Gaussian random fields, with index set in a compact set of a d-dimensional Euclidean space. Such conditions have been elusive for over about 50 years of spatial statistics. We then focus on the multivariate Matern and Generalized Wendland classes of matrix valued covariance functions, that have been very popular for having parameters that are crucial to spatial interpolation, and that control the mean square differentiability of the associated Gaussian process. We provide sufficient conditions, for equivalence of Gaussian measures, relying on the covariance parameters of these two classes. This enables to identify the parameters that are crucial to asymptotically equivalent interpolation in multivariate geostatistics. Our findings are then illustrated through simulation studies.
We introduce estimation and test procedures through divergence optimization for discrete or continuous parametric models. This approach is based on a new dual representation for divergences. We treat point estimation and tests for simple and composite hypotheses, extending maximum likelihood technique. An other view at the maximum likelihood approach, for estimation and test, is given. We prove existence and consistency of the proposed estimates. The limit laws of the estimates and test statistics (including the generalized likelihood ratio one) are given both under the null and the alternative hypotheses, and approximation of the power functions is deduced. A new procedure of construction of confidence regions, when the parameter may be a boundary value of the parameter space, is proposed. Also, a solution to the irregularity problem of the generalized likelihood ratio test pertaining to the number of components in a mixture is given, and a new test is proposed, based on $chi ^{2}$-divergence on signed finite measures and duality technique.
Let $X$ be a centered Gaussian random variable in a separable Hilbert space ${mathbb H}$ with covariance operator $Sigma.$ We study a problem of estimation of a smooth functional of $Sigma$ based on a sample $X_1,dots ,X_n$ of $n$ independent observations of $X.$ More specifically, we are interested in functionals of the form $langle f(Sigma), Brangle,$ where $f:{mathbb R}mapsto {mathbb R}$ is a smooth function and $B$ is a nuclear operator in ${mathbb H}.$ We prove concentration and normal approximation bounds for plug-in estimator $langle f(hat Sigma),Brangle,$ $hat Sigma:=n^{-1}sum_{j=1}^n X_jotimes X_j$ being the sample covariance based on $X_1,dots, X_n.$ These bounds show that $langle f(hat Sigma),Brangle$ is an asymptotically normal estimator of its expectation ${mathbb E}_{Sigma} langle f(hat Sigma),Brangle$ (rather than of parameter of interest $langle f(Sigma),Brangle$) with a parametric convergence rate $O(n^{-1/2})$ provided that the effective rank ${bf r}(Sigma):= frac{{bf tr}(Sigma)}{|Sigma|}$ (${rm tr}(Sigma)$ being the trace and $|Sigma|$ being the operator norm of $Sigma$) satisfies the assumption ${bf r}(Sigma)=o(n).$ At the same time, we show that the bias of this estimator is typically as large as $frac{{bf r}(Sigma)}{n}$ (which is larger than $n^{-1/2}$ if ${bf r}(Sigma)geq n^{1/2}$). In the case when ${mathbb H}$ is finite-dimensional space of dimension $d=o(n),$ we develop a method of bias reduction and construct an estimator $langle h(hat Sigma),Brangle$ of $langle f(Sigma),Brangle$ that is asymptotically normal with convergence rate $O(n^{-1/2}).$ Moreover, we study asymptotic properties of the risk of this estimator and prove minimax lower bounds for arbitrary estimators showing the asymptotic efficiency of $langle h(hat Sigma),Brangle$ in a semi-parametric sense.
We provide sufficient conditions for the asymptotic normality of the generalized correlation coefficient $sum a_{ij}b_{ij}$ under the permutation null distribution when $a_{ij}$s are symmetric and $b_{ij}$s are symmetric.