No Arabic abstract
This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly under-explored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a suite of solutions to perform valid inference on the principal subspace based on two estimators: a vanilla SVD-based approach, and a more refined iterative scheme called $textsf{HeteroPCA}$ (Zhang et al., 2018). We develop non-asymptotic distributional guarantees for both estimators, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Particularly worth highlighting is the inference procedure built on top of $textsf{HeteroPCA}$, which is not only valid but also statistically efficient for broader scenarios (e.g., it covers a wider range of missing rates and signal-to-noise ratios). Our solutions are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels and noise distributions.
We introduce uncertainty regions to perform inference on partial correlations when data are missing not at random. These uncertainty regions are shown to have a desired asymptotic coverage. Their finite sample performance is illustrated via simulations and real data example.
This paper studies the problem of accurately recovering a sparse vector $beta^{star}$ from highly corrupted linear measurements $y = X beta^{star} + e^{star} + w$ where $e^{star}$ is a sparse error vector whose nonzero entries may be unbounded and $w$ is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both $beta^{star}$ and $e^{star}$. Our first result shows that the extended Lasso can faithfully recover both the regression as well as the corruption vector. Our analysis relies on the notion of extended restricted eigenvalue for the design matrix $X$. Our second set of results applies to a general class of Gaussian design matrix $X$ with i.i.d rows $oper N(0, Sigma)$, for which we can establish a surprising result: the extended Lasso can recover exact signed supports of both $beta^{star}$ and $e^{star}$ from only $Omega(k log p log n)$ observations, even when the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is indeed optimal.
Classical semiparametric inference with missing outcome data is not robust to contamination of the observed data and a single observation can have arbitrarily large influence on estimation of a parameter of interest. This sensitivity is exacerbated when inverse probability weighting methods are used, which may overweight contaminated observations. We introduce inverse probability weighted, double robust and outcome regression estimators of location and scale parameters, which are robust to contamination in the sense that their influence function is bounded. We give asymptotic properties and study finite sample behaviour. Our simulated experiments show that contamination can be more serious a threat to the quality of inference than model misspecification. An interesting aspect of our results is that the auxiliary outcome model used to adjust for ignorable missingness by some of the estimators, is also useful to protect against contamination. We also illustrate through a case study how both adjustment to ignorable missingness and protection against contamination are achieved through weighting schemes, which can be contrasted to gain further insights.
This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized by a known, observed dissimilarity measure over spatial indices. Observations are partitioned into clusters with the use of an unsupervised clustering algorithm applied to the dissimilarity measure. Once the partition into clusters is learned, a cluster-based inference procedure is applied to a statistical hypothesis testing procedure. The procedure proposed in the paper allows the number of clusters to depend on the data, which gives researchers a principled method for choosing an appropriate clustering level. The paper gives conditions under which the proposed procedure asymptotically attains correct size. A simulation study shows that the proposed procedure attains near nominal size in finite samples in a variety of statistical testing problems with dependent data.
We study the statistical problem of estimating a rank-one sparse tensor corrupted by additive Gaussian noise, a model also known as sparse tensor PCA. We show that for Bernoulli and Bernoulli-Rademacher distributed signals and emph{for all} sparsity levels which are sublinear in the dimension of the signal, the sparse tensor PCA model exhibits a phase transition called the emph{all-or-nothing phenomenon}. This is the property that for some signal-to-noise ratio (SNR) $mathrm{SNR_c}$ and any fixed $epsilon>0$, if the SNR of the model is below $left(1-epsilonright)mathrm{SNR_c}$, then it is impossible to achieve any arbitrarily small constant correlation with the hidden signal, while if the SNR is above $left(1+epsilon right)mathrm{SNR_c}$, then it is possible to achieve almost perfect correlation with the hidden signal. The all-or-nothing phenomenon was initially established in the context of sparse linear regression, and over the last year also in the context of sparse 2-tensor (matrix) PCA, Bernoulli group testing, and generalized linear models. Our results follow from a more general result showing that for any Gaussian additive model with a discrete uniform prior, the all-or-nothing phenomenon follows as a direct outcome of an appropriately defined near-orthogonality property of the support of the prior distribution.