ترغب بنشر مسار تعليمي؟ اضغط هنا

Measurement Error in Lasso: Impact and Correction

115   0   0.0 ( 0 )
 نشر من قبل {\\O}ystein S{\\o}rensen
 تاريخ النشر 2012
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Regression with the lasso penalty is a popular tool for performing dimension reduction when the number of covariates is large. In many applications of the lasso, like in genomics, covariates are subject to measurement error. We study the impact of measurement error on linear regression with the lasso penalty, both analytically and in simulation experiments. A simple method of correction for measurement error in the lasso is then considered. In the large sample limit, the corrected lasso yields sign consistent covariate selection under conditions very similar to the lasso with perfect measurements, whereas the uncorrected lasso requires much more stringent conditions on the covariance structure of the data. Finally, we suggest methods to correct for measurement error in generalized linear models with the lasso penalty, which we study empirically in simulation experiments with logistic regression, and also apply to a classification problem with microarray data. We see that the corrected lasso selects less false positives than the standard lasso, at a similar level of true positives. The corrected lasso can therefore be used to obtain more conservative covariate selection in genomic analysis.



قيم البحث

اقرأ أيضاً

126 - Michael R. Geller 2020
We review an experimental technique used to correct state preparation and measurement errors on gate-based quantum computers, and discuss its rigorous justification. Within a specific biased quantum measurement model, we prove that nonideal measureme nt of an arbitrary $n$-qubit state is equivalent to ideal projective measurement followed by a classical Markov process $Gamma$ acting on the output probability distribution. Measurement errors can be removed, with rigorous justification, if $Gamma$ can be learned and inverted. We show how to obtain $Gamma$ from gate set tomography (R. Blume-Kohout et al., arXiv:1310.4492) and apply the error correction technique to single IBM Q superconducting qubits.
We show that space- and time-correlated single-qubit rotation errors can lead to high-weight errors in a quantum circuit when the rotation angles are drawn from heavy-tailed distributions. This leads to a breakdown of quantum error correction, yieldi ng reduced or in some cases no protection of the encoded logical qubits. While heavy-tailed phenomena are prevalent in the natural world, there is very little research as to whether noise with these statistics exist in current quantum processing devices. Furthermore, it is an open problem to develop tomographic or noise spectroscopy protocols that could test for the existence of noise with such statistics. These results suggest the need for quantum characterization methods that can reliably detect or reject the presence of such errors together with continued first-principles studies of the origins of space- and time-correlated noise in quantum processors. If such noise does exist, physical or control-based mitigation protocols must be developed to mitigate this noise as it would severely hinder the performance of fault-tolerant quantum computers.
195 - Mengyan Li , Runze Li , Yanyuan Ma 2020
For a high-dimensional linear model with a finite number of covariates measured with error, we study statistical inference on the parameters associated with the error-prone covariates, and propose a new corrected decorrelated score test and the corre sponding one-step estimator. We further establish asymptotic properties of the newly proposed test statistic and the one-step estimator. Under local alternatives, we show that the limiting distribution of our corrected decorrelated score test statistic is non-central normal. The finite-sample performance of the proposed inference procedure is examined through simulation studies. We further illustrate the proposed procedure via an empirical analysis of a real data example.
High-dimensional data sets have become ubiquitous in the past few decades, often with many more covariates than observations. In the frequentist setting, penalized likelihood methods are the most popular approach for variable selection and estimation in high-dimensional data. In the Bayesian framework, spike-and-slab methods are commonly used as probabilistic constructs for high-dimensional modeling. Within the context of linear regression, Rockova and George (2018) introduced the spike-and-slab LASSO (SSL), an approach based on a prior which provides a continuum between the penalized likelihood LASSO and the Bayesian point-mass spike-and-slab formulations. Since its inception, the spike-and-slab LASSO has been extended to a variety of contexts, including generalized linear models, factor analysis, graphical models, and nonparametric regression. The goal of this paper is to survey the landscape surrounding spike-and-slab LASSO methodology. First we elucidate the attractive properties and the computational tractability of SSL priors in high dimensions. We then review methodological developments of the SSL and outline several theoretical developments. We illustrate the methodology on both simulated and real datasets.
Statistical agencies are often asked to produce small area estimates (SAEs) for positively skewed variables. When domain sample sizes are too small to support direct estimators, effects of skewness of the response variable can be large. As such, it i s important to appropriately account for the distribution of the response variable given available auxiliary information. Motivated by this issue and in order to stabilize the skewness and achieve normality in the response variable, we propose an area-level log-measurement error model on the response variable. Then, under our proposed modeling framework, we derive an empirical Bayes (EB) predictor of positive small area quantities subject to the covariates containing measurement error. We propose a corresponding mean squared prediction error (MSPE) of EB predictor using both a jackknife and a bootstrap method. We show that the order of the bias is $O(m^{-1})$, where $m$ is the number of small areas. Finally, we investigate the performance of our methodology using both design-based and model-based simulation studies.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا