ﻻ يوجد ملخص باللغة العربية
High-dimensional predictive models, those with more measurements than observations, require regularization to be well defined, perform well empirically, and possess theoretical guarantees. The amount of regularization, often determined by tuning parameters, is integral to achieving good performance. One can choose the tuning parameter in a variety of ways, such as through resampling methods or generalized information criteria. However, the theory supporting many regularized procedures relies on an estimate for the variance parameter, which is complicated in high dimensions. We develop a suite of information criteria for choosing the tuning parameter in lasso regression by leveraging the literature on high-dimensional variance estimation. We derive intuition showing that existing information-theoretic approaches work poorly in this setting. We compare our risk estimators to existing methods with an extensive simulation and derive some theoretical justification. We find that our new estimators perform well across a wide range of simulation conditions and evaluation criteria.
Penalized (or regularized) regression, as represented by Lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regr
In a Gaussian graphical model, the conditional independence between two variables are characterized by the corresponding zero entries in the inverse covariance matrix. Maximum likelihood method using the smoothly clipped absolute deviation (SCAD) pen
Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this paper, we c
Applying standard statistical methods after model selection may yield inefficient estimators and hypothesis tests that fail to achieve nominal type-I error rates. The main issue is the fact that the post-selection distribution of the data differs fro
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden c