ترغب بنشر مسار تعليمي؟ اضغط هنا

Non-asymptotic upper bounds for the reconstruction error of PCA

91   0   0.0 ( 0 )
 نشر من قبل Martin Wahl
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We analyse the reconstruction error of principal component analysis (PCA) and prove non-asymptotic upper bounds for the corresponding excess risk. These bounds unify and improve existing upper bounds from the literature. In particular, they give oracle inequalities under mild eigenvalue conditions. The bounds reveal that the excess risk differs significantly from usually considered subspace distances based on canonical angles. Our approach relies on the analysis of empirical spectral projectors combined with concentration inequalities for weighted empirical covariance operators and empirical eigenvalues.



قيم البحث

اقرأ أيضاً

77 - Yufei Yi , Matey Neykov 2021
The Chebyshev or $ell_{infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $ell_{infty}$ objective function begin{align*} hat{boldsymbol{beta}} := arg min_{boldsymbol{beta}} |boldsymbol{Y} - mathbf{X}boldsymbol{beta}|_{infty}. end{align*} The asymptotic distribution of the Chebyshev estimator under fixed number of covariates were recently studied (Knight, 2020), yet finite sample guarantees and generalizations to high-dimensional settings remain open. In this paper, we develop non-asymptotic upper bounds on the estimation error $|hat{boldsymbol{beta}}-boldsymbol{beta}^*|_2$ for a Chebyshev estimator $hat{boldsymbol{beta}}$, in a regression setting with uniformly distributed noise $varepsilon_isim U([-a,a])$ where $a$ is either known or unknown. With relatively mild assumptions on the (random) design matrix $mathbf{X}$, we can bound the error rate by $frac{C_p}{n}$ with high probability, for some constant $C_p$ depending on the dimension $p$ and the law of the design. Furthermore, we illustrate that there exist designs for which the Chebyshev estimator is (nearly) minimax optimal. In addition we show that Chebyshevs LASSO has advantages over the regular LASSO in high dimensional situations, provided that the noise is uniform. Specifically, we argue that it achieves a much faster rate of estimation under certain assumptions on the growth rate of the sparsity level and the ambient dimension with respect to the sample size.
We consider the problem of finding confidence intervals for the risk of forecasting the future of a stationary, ergodic stochastic process, using a model estimated from the past of the process. We show that a bootstrap procedure provides valid confid ence intervals for the risk, when the data source is sufficiently mixing, and the loss function and the estimator are suitably smooth. Autoregressive (AR(d)) models estimated by least squares obey the necessary regularity conditions, even when mis-specified, and simulations show that the finite- sample coverage of our bounds quickly converges to the theoretical, asymptotic level. As an intermediate step, we derive sufficient conditions for asymptotic independence between empirical distribution functions formed by splitting a realization of a stochastic process, of independent interest.
78 - Wenjia Wang , Rui Tuo , 2017
Kriging based on Gaussian random fields is widely used in reconstructing unknown functions. The kriging method has pointwise predictive distributions which are computationally simple. However, in many applications one would like to predict for a rang e of untried points simultaneously. In this work we obtain some error bounds for the (simple) kriging predictor under the uniform metric. It works for a scattered set of input points in an arbitrary dimension, and also covers the case where the covariance function of the Gaussian process is misspecified. These results lead to a better understanding of the rate of convergence of kriging under the Gaussian or the Matern correlation functions, the relationship between space-filling designs and kriging models, and the robustness of the Matern correlation functions.
183 - Xinyi Xu , Feng Liang 2010
We consider the problem of estimating the predictive density of future observations from a non-parametric regression model. The density estimators are evaluated under Kullback--Leibler divergence and our focus is on establishing the exact asymptotics of minimax risk in the case of Gaussian errors. We derive the convergence rate and constant for minimax risk among Bayesian predictive densities under Gaussian priors and we show that this minimax risk is asymptotically equivalent to that among all density estimators.
We establish exponential bounds for the hypergeometric distribution which include a finite sampling correction factor, but are otherwise analogous to bounds for the binomial distribution due to Leon and Perron (2003) and Talagrand (1994). We also est ablish a convex ordering for sampling without replacement from populations of real numbers between zero and one: a population of all zeros or ones (and hence yielding a hypergeometric distribution in the upper bound) gives the extreme case.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا