Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

LASSO risk and phase transition under dependence

92 0 0.0 ( 0 )

Download Cite

Added by Hanwen Huang

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Hanwen Huang

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the problem of recovering a $k$-sparse signal ${mbox{$beta$}}_0inmathbb{R}^p$ from noisy observations $bf y={bf X}mbox{$beta$}_0+{bf w}inmathbb{R}^n$. One of the most popular approaches is the $l_1$-regularized least squares, also known as LASSO. We analyze the mean square error of LASSO in the case of random designs in which each row of ${bf X}$ is drawn from distribution $N(0,{mbox{$Sigma$}})$ with general ${mbox{$Sigma$}}$. We first derive the asymptotic risk of LASSO in the limit of $n,prightarrowinfty$ with $n/prightarrowdelta$. We then examine conditions on $n$, $p$, and $k$ for LASSO to exactly reconstruct ${mbox{$beta$}}_0$ in the noiseless case ${bf w}=0$. A phase boundary $delta_c=delta(epsilon)$ is precisely established in the phase space defined by $0ledelta,epsilonle 1$, where $epsilon=k/p$. Above this boundary, LASSO perfectly recovers ${mbox{$beta$}}_0$ with high probability. Below this boundary, LASSO fails to recover $mbox{$beta$}_0$ with high probability. While the values of the non-zero elements of ${mbox{$beta$}}_0$ do not have any effect on the phase transition curve, our analysis shows that $delta_c$ does depend on the signed pattern of the nonzero values of $mbox{$beta$}_0$ for general ${mbox{$Sigma$}} e{bf I}_p$. This is in sharp contrast to the previous phase transition results derived in i.i.d. case with $mbox{$Sigma$}={bf I}_p$ where $delta_c$ is completely determined by $epsilon$ regardless of the distribution of $mbox{$beta$}_0$. Underlying our formalism is a recently developed efficient algorithm called approximate message passing (AMP) algorithm. We generalize the state evolution of AMP from i.i.d. case to general case with ${mbox{$Sigma$}} e{bf I}_p$. Extensive computational experiments confirm that our theoretical predictions are consistent with simulation results on moderate size system.

rate research

Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding

158 - Zijian Guo , Domagoj Cevid , Peter Buhlmann 2020

Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the {em Doubly Debiased Lasso} estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.

Methodology Statistics Theory Statistics Theory

False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation

110 - Lilun Du , Xu Guo , Wenguang Sun 2020

We develop a new class of distribution--free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data--driven threshold along the ranking to control the FDR. The SDA filter substantially outperforms the knockoff method in power under moderate to strong dependence, and is more robust than existing methods based on asymptotic $p$-values. We first develop finite--sample theory to provide an upper bound for the actual FDR under general dependence, and then establish the asymptotic validity of SDA for both the FDR and false discovery proportion (FDP) control under mild regularity conditions. The procedure is implemented in the R package texttt{SDA}. Numerical results confirm the effectiveness and robustness of SDA in FDR control and show that it achieves substantial power gain over existing methods in many settings.

Methodology Statistics Theory Statistics Theory

High order chaotic limits of wavelet scalograms under long--range dependence

546 - Marianne Clausel , Murad S. Taqqu 2012

Let $G$ be a non--linear function of a Gaussian process ${X_t}_{tinmathbb{Z}}$ with long--range dependence. The resulting process ${G(X_t)}_{tinmathbb{Z}}$ is not Gaussian when $G$ is not linear. We consider random wavelet coefficients associated with ${G(X_t)}_{tinmathbb{Z}}$ and the corresponding wavelet scalogram which is the average of squares of wavelet coefficients over locations. We obtain the asymptotic behavior of the scalogram as the number of observations and scales tend to infinity. It is known that when $G$ is a Hermite polynomial of any order, then the limit is either the Gaussian or the Rosenblatt distribution, that is, the limit can be represented by a multiple Wiener-It^o integral of order one or two. We show, however, that there are large classes of functions $G$ which yield a higher order Hermite distribution, that is, the limit can be represented by a a multiple Wiener-It^o integral of order greater than two.

Probability Statistics Theory Statistics Theory

Distributional Consistency of Lasso by Perturbation Bootstrap

157 - Debraj Das , S. N. Lahiri 2017

Least Absolute Shrinkage and Selection Operator or the Lasso, introduced by Tibshirani (1996), is a popular estimation procedure in multiple linear regression when underlying design has a sparse structure, because of its property that it sets some regression coefficients exactly equal to 0. In this article, we develop a perturbation bootstrap method and establish its validity in approximating the distribution of the Lasso in heteroscedastic linear regression. We allow the underlying covariates to be either random or non-random. We show that the proposed bootstrap method works irrespective of the nature of the covariates, unlike the resample-based bootstrap of Freedman (1981) which must be tailored based on the nature (random vs non-random) of the covariates. Simulation study also justifies our method in finite samples.

Methodology Statistics Theory Statistics Theory

Risk-consistency of cross-validation with lasso-type procedures

231 - Darren Homrighausen , Daniel J. McDonald 2013

The lasso and related sparsity inducing algorithms have been the target of substantial theoretical and applied research. Correspondingly, many results are known about their behavior for a fixed or optimally chosen tuning parameter specified up to unknown constants. In practice, however, this oracle tuning parameter is inaccessible so one must use the data to select one. Common statistical practice is to use a variant of cross-validation for this task. However, little is known about the theoretical properties of the resulting predictions with such data-dependent methods. We consider the high-dimensional setting with random design wherein the number of predictors $p$ grows with the number of observations $n$. Under typical assumptions on the data generating process, similar to those in the literature, we recover oracle rates up to a log factor when choosing the tuning parameter with cross-validation. Under weaker conditions, when the true model is not necessarily linear, we show that the lasso remains risk consistent relative to its linear oracle. We also generalize these results to the group lasso and square-root lasso and investigate the predictive and model selection performance of cross-validation via simulation.

Statistics Theory Machine Learning Statistics Theory

comments

Fetching comments

Mamoun Private University For Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LASSO risk and phase transition under dependence

Ask ChatGPT about the research

No Arabic abstract

Read More