ترغب بنشر مسار تعليمي؟ اضغط هنا

Bootstrapping the Operator Norm in High Dimensions: Error Estimation for Covariance Matrices and Sketching

74   0   0.0 ( 0 )
 نشر من قبل Miles Lopes
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Although the operator (spectral) norm is one of the most widely used metrics for covariance estimation, comparatively little is known about the fluctuations of error in this norm. To be specific, let $hatSigma$ denote the sample covariance matrix of $n$ observations in $mathbb{R}^p$ that arise from a population matrix $Sigma$, and let $T_n=sqrt{n}|hatSigma-Sigma|_{text{op}}$. In the setting where the eigenvalues of $Sigma$ have a decay profile of the form $lambda_j(Sigma)asymp j^{-2beta}$, we analyze how well the bootstrap can approximate the distribution of $T_n$. Our main result shows that up to factors of $log(n)$, the bootstrap can approximate the distribution of $T_n$ at the dimension-free rate of $n^{-frac{beta-1/2}{6beta+4}}$, with respect to the Kolmogorov metric. Perhaps surprisingly, a result of this type appears to be new even in settings where $p< n$. More generally, we discuss the consequences of this result beyond covariance matrices and show how the bootstrap can be used to estimate the errors of sketching algorithms in randomized numerical linear algebra (RandNLA). An illustration of these ideas is also provided with a climate data example.

قيم البحث

اقرأ أيضاً

Centered Gaussian random fields (GRFs) indexed by compacta such as smooth, bounded Euclidean domains or smooth, compact and orientable manifolds are determined by their covariance operators. We consider centered GRFs given as variational solutions to coloring operator equations driven by spatial white noise, with an elliptic self-adjoint pseudodifferential coloring operator from the Hormander class. This includes the Matern class of GRFs as a special case. Using biorthogonal multiresolution analyses on the manifold, we prove that the precision and covariance operators, respectively, may be identified with bi-infinite matrices and finite sections may be diagonally preconditioned rendering the condition number independent of the dimension $p$ of this section. We prove that a tapering strategy by thresholding applied on finite sections of the bi-infinite precision and covariance matrices results in optimally numerically sparse approximations. That is, asymptotically only linearly many nonzero matrix entries are sufficient to approximate the original section of the bi-infinite covariance or precision matrix using this tapering strategy to arbitrary precision. The locations of these nonzero matrix entries are known a priori. The tapered covariance or precision matrices may also be optimally diagonally preconditioned. Analysis of the relative size of the entries of the tapered covariance matrices motivates novel, multilevel Monte Carlo (MLMC) oracles for covariance estimation, in sample complexity that scales log-linearly with respect to the number $p$ of parameters. In addition, we propose and analyze a novel compressive algorithm for simulating and kriging of GRFs. The complexity (work and memory vs. accuracy) of these three algorithms scales near-optimally in terms of the number of parameters $p$ of the sample-wise approximation of the GRF in Sobolev scales.
We propose a Bayesian methodology for estimating spiked covariance matrices with jointly sparse structure in high dimensions. The spiked covariance matrix is reparametrized in terms of the latent factor model, where the loading matrix is equipped wit h a novel matrix spike-and-slab LASSO prior, which is a continuous shrinkage prior for modeling jointly sparse matrices. We establish the rate-optimal posterior contraction for the covariance matrix with respect to the operator norm as well as that for the principal subspace with respect to the projection operator norm loss. We also study the posterior contraction rate of the principal subspace with respect to the two-to-infinity norm loss, a novel loss function measuring the distance between subspaces that is able to capture element-wise eigenvector perturbations. We show that the posterior contraction rate with respect to the two-to-infinity norm loss is tighter than that with respect to the routinely used projection operator norm loss under certain low-rank and bounded coherence conditions. In addition, a point estimator for the principal subspace is proposed with the rate-optimal risk bound with respect to the projection operator norm loss. These results are based on a collection of concentration and large deviation inequalities for the matrix spike-and-slab LASSO prior. The numerical performance of the proposed methodology is assessed through synthetic examples and the analysis of a real-world face data example.
We consider high-dimensional multivariate linear regression models, where the joint distribution of covariates and response variables is a multivariate normal distribution with a bandable covariance matrix. The main goal of this paper is to estimate the regression coefficient matrix, which is a function of the bandable covariance matrix. Although the tapering estimator of covariance has the minimax optimal convergence rate for the class of bandable covariances, we show that it has a sub-optimal convergence rate for the regression coefficient; that is, a minimax estimator for the class of bandable covariances may not be a minimax estimator for its functionals. We propose the blockwise tapering estimator of the regression coefficient, which has the minimax optimal convergence rate for the regression coefficient under the bandable covariance assumption. We also propose a Bayesian procedure called the blockwise tapering post-processed posterior of the regression coefficient and show that the proposed Bayesian procedure has the minimax optimal convergence rate for the regression coefficient under the bandable covariance assumption. We show that the proposed methods outperform the existing methods via numerical studies.
198 - Song Xi Chen , Bin Guo , Yumou Qiu 2019
We consider testing the equality of two high-dimensional covariance matrices by carrying out a multi-level thresholding procedure, which is designed to detect sparse and faint differences between the covariances. A novel U-statistic composition is de veloped to establish the asymptotic distribution of the thresholding statistics in conjunction with the matrix blocking and the coupling techniques. We propose a multi-thresholding test that is shown to be powerful in detecting sparse and weak differences between two covariance matrices. The test is shown to have attractive detection boundary and to attain the optimal minimax rate in the signal strength under different regimes of high dimensionality and the sparsity of the signal. Simulation studies are conducted to demonstrate the utility of the proposed test.
176 - Xiufan Yu , Danning Li , 2020
Testing large covariance matrices is of fundamental importance in statistical analysis with high-dimensional data. In the past decade, three types of test statistics have been studied in the literature: quadratic form statistics, maximum form statist ics, and their weighted combination. It is known that quadratic form statistics would suffer from low power against sparse alternatives and maximum form statistics would suffer from low power against dense alternatives. The weighted combination methods were introduced to enhance the power of quadratic form statistics or maximum form statistics when the weights are appropriately chosen. In this paper, we provide a new perspective to exploit the full potential of quadratic form statistics and maximum form statistics for testing high-dimensional covariance matrices. We propose a scale-invariant power enhancement test based on Fishers method to combine the p-values of quadratic form statistics and maximum form statistics. After carefully studying the asymptotic joint distribution of quadratic form statistics and maximum form statistics, we prove that the proposed combination method retains the correct asymptotic size and boosts the power against more general alternatives. Moreover, we demonstrate the finite-sample performance in simulation studies and a real application.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا