No Arabic abstract
Topological data analysis (TDA) allows us to explore the topological features of a dataset. Among topological features, lower dimensional ones have recently drawn the attention of practitioners in mathematics and statistics due to their potential to aid the discovery of low dimensional structure in a data set. However, lower dimensional features are usually challenging to detect from a probabilistic perspective. In this paper, lower dimensional topological features occurring as zero-density regions of density functions are introduced and thoroughly investigated. Specifically, we consider sequences of coverings for the support of a density function in which the coverings are comprised of balls with shrinking radii. We show that, when these coverings satisfy certain sufficient conditions as the sample size goes to infinity, we can detect lower dimensional, zero-density regions with increasingly higher probability while guarding against false detection. We supplement the theoretical developments with the discussion of simulated experiments that elucidate the behavior of the methodology for different choices of the tuning parameters that govern the construction of the covering sequences and characterize the asymptotic results.
In model selection, several types of cross-validation are commonly used and many variants have been introduced. While consistency of some of these methods has been proven, their rate of convergence to the oracle is generally still unknown. Until now, an asymptotic analysis of crossvalidation able to answer this question has been lacking. Existing results focus on the pointwise estimation of the risk of a single estimator, whereas analysing model selection requires understanding how the CV risk varies with the model. In this article, we investigate the asymptotics of the CV risk in the neighbourhood of the optimal model, for trigonometric series estimators in density estimation. Asymptotically, simple validation and incomplete V --fold CV behave like the sum of a convex function fn and a symmetrized Brownian changed in time W gn/V. We argue that this is the right asymptotic framework for studying model selection.
Sample correlation matrices are employed ubiquitously in statistics. However, quite surprisingly, little is known about their asymptotic spectral properties for high-dimensional data, particularly beyond the case of null models for which the data is assumed independent. Here, considering the popular class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both the leading eigenvalues and eigenvectors of sample correlation matrices. These results are obtained under high-dimensional settings for which the number of samples n and variables p approach infinity, with p/n tending to a constant. To first order, the spectral properties of sample correlation matrices are seen to coincide with those of sample covariance matrices; however their asymptotic distributions can differ significantly, with fluctuations of both the sample eigenvalues and eigenvectors often being remarkably smaller than those of their sample covariance counterparts.
We obtain an asymptotic expansion for the null distribution function of thegradient statistic for testing composite null hypotheses in the presence of nuisance parameters. The expansion is derived using a Bayesian route based on the shrinkage argument described in Ghosh and Mukerjee (1991). Using this expansion, we propose a Bartlett-type corrected gradient statistic with chi-square distribution up to an error of order o(n^{-1}) under the null hypothesis. Further, we also use the expansion to modify the percentage points of the large sample reference chi-square distribution. A small Monte Carlo experiment and various examples are presented and discussed.
In this paper, we use the class of Wasserstein metrics to study asymptotic properties of posterior distributions. Our first goal is to provide sufficient conditions for posterior consistency. In addition to the well-known Schwartzs Kullback--Leibler condition on the prior, the true distribution and most probability measures in the support of the prior are required to possess moments up to an order which is determined by the order of the Wasserstein metric. We further investigate convergence rates of the posterior distributions for which we need stronger moment conditions. The required tail conditions are sharp in the sense that the posterior distribution may be inconsistent or contract slowly to the true distribution without these conditions. Our study involves techniques that build on recent advances on Wasserstein convergence of empirical measures. We apply the results to density estimation with a Dirichlet process mixture prior and conduct a simulation study for further illustration.
In this paper, we apply doubly robust approach to estimate, when some covariates are given, the conditional average treatment effect under parametric, semiparametric and nonparametric structure of the nuisance propensity score and outcome regression models. We then conduct a systematic study on the asymptotic distributions of nine estimators with different combinations of estimated propensity score and outcome regressions. The study covers the asymptotic properties with all models correctly specified; with either propensity score or outcome regressions locally / globally misspecified; and with all models locally / globally misspecified. The asymptotic variances are compared and the asymptotic bias correction under model-misspecification is discussed. The phenomenon that the asymptotic variance, with model-misspecification, could sometimes be even smaller than that with all models correctly specified is explored. We also conduct a numerical study to examine the theoretical results.