No Arabic abstract
A new goodness-of-fit test for normality in high-dimension (and Reproducing Kernel Hilbert Space) is proposed. It shares common ideas with the Maximum Mean Discrepancy (MMD) it outperforms both in terms of computation time and applicability to a wider range of data. Theoretical results are derived for the Type-I and Type-II errors. They guarantee the control of Type-I error at prescribed level and an exponentially fast decrease of the Type-II error. Synthetic and real data also illustrate the practical improvement allowed by our test compared with other leading approaches in high-dimensional settings.
We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence our procedure may be applied either to test data for normality or to test parameters (mean and covariance) if data are assumed Gaussian. Our test is based on the same principle as the MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such as homogeneity or independence testing. Our method makes use of a special kind of parametric bootstrap (typical of goodness-of-fit tests) which is computationally more efficient than standard parametric bootstrap. Moreover, an upper bound for the Type-II error highlights the dependence on influential quantities. Experiments illustrate the practical improvement allowed by our test in high-dimensional settings where common normality tests are known to fail. We also consider an application to covariance rank selection through a sequential procedure.
We derive asymptotic normality of kernel type deconvolution estimators of the density, the distribution function at a fixed point, and of the probability of an interval. We consider the so called super smooth case where the characteristic function of the known distribution decreases exponentially. It turns out that the limit behavior of the pointwise estimators of the density and distribution function is relatively straightforward while the asymptotics of the estimator of the probability of an interval depends in a complicated way on the sequence of bandwidths.
In this paper, a novel Bayesian nonparametric test for assessing multivariate normal models is presented. While there are extensive frequentist and graphical methods for testing multivariate normality, it is challenging to find Bayesian counterparts. The proposed approach is based on the use of the Dirichlet process and Mahalanobis distance. More precisely, the Mahalanobis distance is employed as a good technique to transform the $m$-variate problem into a univariate problem. Then the Dirichlet process is used as a prior on the distribution of the Mahalanobis distance. The concentration of the distribution of the distance between the posterior process and the chi-square distribution with $m$ degrees of freedom is compared to the concentration of the distribution of the distance between the prior process and the chi-square distribution with $m$ degrees of freedom via a relative belief ratio. The distance between the Dirichlet process and the chi-square distribution is established based on the Anderson-Darling distance. Key theoretical results of the approach are derived. The procedure is illustrated through several examples, in which the proposed approach shows excellent performance.
The paper discusses the estimation of a continuous density function of the target random field $X_{bf{i}}$, $bf{i}in mathbb {Z}^N$ which is contaminated by measurement errors. In particular, the observed random field $Y_{bf{i}}$, $bf{i}in mathbb {Z}^N$ is such that $Y_{bf{i}}=X_{bf{i}}+epsilon_{bf{i}}$, where the random error $epsilon_{bf{i}}$ is from a known distribution and independent of the target random field. Compared to the existing results, the paper is improved in two directions. First, the random vectors in contrast to univariate random variables are investigated. Second, a random field with a certain spatial interactions instead of i. i. d. random variables is studied. Asymptotic normality of the proposed estimator is established under appropriate conditions.
Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer such associations, it is often of interest to test the structure of the regression coefficients matrix, and the likelihood ratio test (LRT) is one of the most popular approaches in practice. Despite its popularity, it is known that the classical $chi^2$ approximations for LRTs often fail in high-dimensional settings, where the dimensions of responses and predictors $(m,p)$ are allowed to grow with the sample size $n$. Though various corrected LRTs and other test statistics have been proposed in the literature, the fundamental question of when the classic LRT starts to fail is less studied, an answer to which would provide insights for practitioners, especially when analyzing data with $m/n$ and $p/n$ small but not negligible. Moreover, the power performance of the LRT in high-dimensional data analysis remains underexplored. To address these issues, the first part of this work gives the asymptotic boundary where the classical LRT fails and develops the corrected limiting distribution of the LRT for a general asymptotic regime. The second part of this work further studies the test power of the LRT in the high-dimensional setting. The result not only advances the current understanding of asymptotic behavior of the LRT under alternative hypothesis, but also motivates the development of a power-enhanced LRT. The third part of this work considers the setting with $p>n$, where the LRT is not well-defined. We propose a two-step testing procedure by first performing dimension reduction and then applying the proposed LRT. Theoretical properties are developed to ensure the validity of the proposed method. Numerical studies are also presented to demonstrate its good performance.