بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

LASSO risk and phase transition under dependence

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hanwen Huang

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Hanwen Huang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of recovering a $k$-sparse signal ${mbox{$beta$}}_0inmathbb{R}^p$ from noisy observations $bf y={bf X}mbox{$beta$}_0+{bf w}inmathbb{R}^n$. One of the most popular approaches is the $l_1$-regularized least squares, also known as LASSO. We analyze the mean square error of LASSO in the case of random designs in which each row of ${bf X}$ is drawn from distribution $N(0,{mbox{$Sigma$}})$ with general ${mbox{$Sigma$}}$. We first derive the asymptotic risk of LASSO in the limit of $n,prightarrowinfty$ with $n/prightarrowdelta$. We then examine conditions on $n$, $p$, and $k$ for LASSO to exactly reconstruct ${mbox{$beta$}}_0$ in the noiseless case ${bf w}=0$. A phase boundary $delta_c=delta(epsilon)$ is precisely established in the phase space defined by $0ledelta,epsilonle 1$, where $epsilon=k/p$. Above this boundary, LASSO perfectly recovers ${mbox{$beta$}}_0$ with high probability. Below this boundary, LASSO fails to recover $mbox{$beta$}_0$ with high probability. While the values of the non-zero elements of ${mbox{$beta$}}_0$ do not have any effect on the phase transition curve, our analysis shows that $delta_c$ does depend on the signed pattern of the nonzero values of $mbox{$beta$}_0$ for general ${mbox{$Sigma$}} e{bf I}_p$. This is in sharp contrast to the previous phase transition results derived in i.i.d. case with $mbox{$Sigma$}={bf I}_p$ where $delta_c$ is completely determined by $epsilon$ regardless of the distribution of $mbox{$beta$}_0$. Underlying our formalism is a recently developed efficient algorithm called approximate message passing (AMP) algorithm. We generalize the state evolution of AMP from i.i.d. case to general case with ${mbox{$Sigma$}} e{bf I}_p$. Extensive computational experiments confirm that our theoretical predictions are consistent with simulation results on moderate size system.

قيم البحث

158 - Zijian Guo , Domagoj Cevid , Peter Buhlmann 2020

Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden c onfounding and propose the {em Doubly Debiased Lasso} estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.

المنهجية نظرية الإحصاء نظرية الإحصاء

False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation

110 - Lilun Du , Xu Guo , Wenguang Sun 2020

We develop a new class of distribution--free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence struct ure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data--driven threshold along the ranking to control the FDR. The SDA filter substantially outperforms the knockoff method in power under moderate to strong dependence, and is more robust than existing methods based on asymptotic $p$-values. We first develop finite--sample theory to provide an upper bound for the actual FDR under general dependence, and then establish the asymptotic validity of SDA for both the FDR and false discovery proportion (FDP) control under mild regularity conditions. The procedure is implemented in the R package texttt{SDA}. Numerical results confirm the effectiveness and robustness of SDA in FDR control and show that it achieves substantial power gain over existing methods in many settings.

المنهجية نظرية الإحصاء نظرية الإحصاء

High order chaotic limits of wavelet scalograms under long--range dependence

780 - Marianne Clausel , Murad S. Taqqu 2012

Let $G$ be a non--linear function of a Gaussian process ${X_t}_{tinmathbb{Z}}$ with long--range dependence. The resulting process ${G(X_t)}_{tinmathbb{Z}}$ is not Gaussian when $G$ is not linear. We consider random wavelet coefficients associated wit h ${G(X_t)}_{tinmathbb{Z}}$ and the corresponding wavelet scalogram which is the average of squares of wavelet coefficients over locations. We obtain the asymptotic behavior of the scalogram as the number of observations and scales tend to infinity. It is known that when $G$ is a Hermite polynomial of any order, then the limit is either the Gaussian or the Rosenblatt distribution, that is, the limit can be represented by a multiple Wiener-It^o integral of order one or two. We show, however, that there are large classes of functions $G$ which yield a higher order Hermite distribution, that is, the limit can be represented by a a multiple Wiener-It^o integral of order greater than two.

الاحتمالات نظرية الإحصاء نظرية الإحصاء

Distributional Consistency of Lasso by Perturbation Bootstrap

157 - Debraj Das , S. N. Lahiri 2017

Least Absolute Shrinkage and Selection Operator or the Lasso, introduced by Tibshirani (1996), is a popular estimation procedure in multiple linear regression when underlying design has a sparse structure, because of its property that it sets some re gression coefficients exactly equal to 0. In this article, we develop a perturbation bootstrap method and establish its validity in approximating the distribution of the Lasso in heteroscedastic linear regression. We allow the underlying covariates to be either random or non-random. We show that the proposed bootstrap method works irrespective of the nature of the covariates, unlike the resample-based bootstrap of Freedman (1981) which must be tailored based on the nature (random vs non-random) of the covariates. Simulation study also justifies our method in finite samples.

المنهجية نظرية الإحصاء نظرية الإحصاء

Risk-consistency of cross-validation with lasso-type procedures

434 - Darren Homrighausen , Daniel J. McDonald 2013

The lasso and related sparsity inducing algorithms have been the target of substantial theoretical and applied research. Correspondingly, many results are known about their behavior for a fixed or optimally chosen tuning parameter specified up to unk nown constants. In practice, however, this oracle tuning parameter is inaccessible so one must use the data to select one. Common statistical practice is to use a variant of cross-validation for this task. However, little is known about the theoretical properties of the resulting predictions with such data-dependent methods. We consider the high-dimensional setting with random design wherein the number of predictors $p$ grows with the number of observations $n$. Under typical assumptions on the data generating process, similar to those in the literature, we recover oracle rates up to a log factor when choosing the tuning parameter with cross-validation. Under weaker conditions, when the true model is not necessarily linear, we show that the lasso remains risk consistent relative to its linear oracle. We also generalize these results to the group lasso and square-root lasso and investigate the predictive and model selection performance of cross-validation via simulation.

نظرية الإحصاء التعلم الالي نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة وهران احمد بن بله

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LASSO risk and phase transition under dependence

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً