بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Robust Lasso with missing and grossly corrupted observations

456 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nam Nguyen Hoai

تاريخ النشر 2011

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Nam H. Nguyen - Trac D. Tran

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper studies the problem of accurately recovering a sparse vector $beta^{star}$ from highly corrupted linear measurements $y = X beta^{star} + e^{star} + w$ where $e^{star}$ is a sparse error vector whose nonzero entries may be unbounded and $w$ is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both $beta^{star}$ and $e^{star}$. Our first result shows that the extended Lasso can faithfully recover both the regression as well as the corruption vector. Our analysis relies on the notion of extended restricted eigenvalue for the design matrix $X$. Our second set of results applies to a general class of Gaussian design matrix $X$ with i.i.d rows $oper N(0, Sigma)$, for which we can establish a surprising result: the extended Lasso can recover exact signed supports of both $beta^{star}$ and $e^{star}$ from only $Omega(k log p log n)$ observations, even when the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is indeed optimal.

قيم البحث

اقرأ أيضاً

The LASSO risk for gaussian matrices

396 - Mohsen Bayati , Andrea Montanari 2010

We consider the problem of learning a coefficient vector x_0in R^N from noisy linear observation y=Ax_0+w in R^n. In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x. In this case, a p opular approach consists in solving an L1-penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Simulations on real data matrices suggest that our results can be relevant in a broad array of practical applications.

نظرية الإحصاء نظرية المعلومات نظرية المعلومات

Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers

420 - David Morales-Jimenez , Romain Couillet , Matthew R. McKay 2015

A large dimensional characterization of robust M-estimators of covariance (or scatter) is provided under the assumption that the dataset comprises independent (essentially Gaussian) legitimate samples as well as arbitrary deterministic samples, refer red to as outliers. Building upon recent random matrix advances in the area of robust statistics, we specifically show that the so-called Maronna M-estimator of scatter asymptotically behaves similar to well-known random matrices when the population and sample sizes grow together to infinity. The introduction of outliers leads the robust estimator to behave asymptotically as the weighted sum of the sample outer products, with a constant weight for all legitimate samples and different weights for the outliers. A fine analysis of this structure reveals importantly that the propensity of the M-estimator to attenuate (or enhance) the impact of outliers is mostly dictated by the alignment of the outliers with the inverse population covariance matrix of the legitimate samples. Thus, robust M-estimators can bring substantial benefits over more simplistic estimators such as the per-sample normalized version of the sample covariance matrix, which is not capable of differentiating the outlying samples. The analysis shows that, within the class of Maronnas estimators of scatter, the Huber estimator is most favorable for rejecting outliers. On the contrary, estimators more similar to Tylers scale invariant estimator (often preferred in the literature) run the risk of inadvertently enhancing some outliers.

نظرية الإحصاء نظرية المعلومات نظرية المعلومات

The Lasso with general Gaussian designs with applications to hypothesis testing

289 - Michael Celentano , Andrea Montanari , Yuting Wei 2020

The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory is not applicable for this m odel due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $bf widehat{theta}$ and the true parameters vector $bf theta^star$ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail. On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large, while $n/p$ is of order one. This characterization was first obtained in the case of standard Gaussian designs, and subsequently generalized to other high-dimensional estimation procedures. Here we extend the same characterization to Gaussian correlated designs with non-singular covariance structure. This characterization is expressed in terms of a simpler ``fixed design model. We establish non-asymptotic bounds on the distance between distributions of various quantities in the two models, which hold uniformly over signals $bf theta^star$ in a suitable sparsity class, and values of the regularization parameter. As applications, we study the distribution of the debiased Lasso, and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.

نظرية الإحصاء نظرية المعلومات التعلم الآلي

Inference for Heteroskedastic PCA with Missing Data

94 - Yuling Yan , Yuxin Chen , Jianqing Fan 2021

This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly under-explored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general d ifficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a suite of solutions to perform valid inference on the principal subspace based on two estimators: a vanilla SVD-based approach, and a more refined iterative scheme called $textsf{HeteroPCA}$ (Zhang et al., 2018). We develop non-asymptotic distributional guarantees for both estimators, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Particularly worth highlighting is the inference procedure built on top of $textsf{HeteroPCA}$, which is not only valid but also statistically efficient for broader scenarios (e.g., it covers a wider range of missing rates and signal-to-noise ratios). Our solutions are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels and noise distributions.

نظرية الإحصاء نظرية المعلومات التعلم الآلي

Robust Lasso-Zero for sparse corruption and model selection with missing covariates

126 - Pascaline Descloux 2020

We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology [Descloux and Sardy, 2018], initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the paramete rs for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased for variable selection with missing values in the covariates. In addition to not requiring the specification of a model for the covariates, nor estimating their covariance matrix or the noise variance, the method has the great advantage of handling missing not-at random values without specifying a parametric model. Numerical experiments and a medical application underline the relevance of Robust Lasso-Zero in such a context with few available competitors. The method is easy to use and implemented in the R library lass0.

تطبيقات الإحصاء المنهجية التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Robust Lasso with missing and grossly corrupted observations

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً