بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Choosing a penalty for model selection in heteroscedastic regression

332 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sylvain Arlot

تاريخ النشر 2010

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Sylvain Arlot

نظرية الإحصاء نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is a function of the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows Cp is suboptimal in this framework. On the contrary, optimal model selection is possible with data-driven penalties such as resampling or $V$-fold penalties. Therefore, it is worth estimating the shape of the penalty from data, even at the price of a higher computational cost. Simulation experiments illustrate the existence of a trade-off between statistical accuracy and computational complexity. As a conclusion, we sketch some rules for choosing a penalty in least-squares regression, depending on what is known about possible variations of the noise-level.

قيم البحث

343 - Emilie Devijver 2015

We study a dimensionality reduction technique for finite mixtures of high-dimensional multivariate response regression models. Both the dimension of the response and the number of predictors are allowed to exceed the sample size. We consider predicto r selection and rank reduction to obtain lower-dimensional approximations. A class of estimators with a fast rate of convergence is introduced. We apply this result to a specific procedure, introduced in [11], where the relevant predictors are selected by the Group-Lasso.

نظرية الإحصاء نظرية الإحصاء

Asymptotic inference in some heteroscedastic regression models with long memory design and errors

361 - Hongwen Guo , Hira L. Koul 2008

This paper discusses asymptotic distributions of various estimators of the underlying parameters in some regression models with long memory (LM) Gaussian design and nonparametric heteroscedastic LM moving average errors. In the simple linear regressi on model, the first-order asymptotic distribution of the least square estimator of the slope parameter is observed to be degenerate. However, in the second order, this estimator is $n^{1/2}$-consistent and asymptotically normal for $h+H<3/2$; nonnormal otherwise, where $h$ and $H$ are LM parameters of design and error processes, respectively. The finite-dimensional asymptotic distributions of a class of kernel type estimators of the conditional variance function $sigma^2(x)$ in a more general heteroscedastic regression model are found to be normal whenever $H<(1+h)/2$, and non-normal otherwise. In addition, in this general model, $log(n)$-consistency of the local Whittle estimator of $H$ based on pseudo residuals and consistency of a cross validation type estimator of $sigma^2(x)$ are established. All of these findings are then used to propose a lack-of-fit test of a parametric regression model, with an application to some currency exchange rate data which exhibit LM.

نظرية الإحصاء نظرية الإحصاء

Consistent Variable Selection for Functional Regression Models

521 - Julian A. A. Collazos 2015

The dual problem of testing the predictive significance of a particular covariate, and identification of the set of relevant covariates is common in applied research and methodological investigations. To study this problem in the context of functiona l linear regression models with predictor variables observed over a grid and a scalar response, we consider basis expansions of the functional covariates and apply the likelihood ratio test. Based on p-values from testing each predictor, we propose a new variable selection method, which is consistent in selecting the relevant predictors from set of available predictors that is allowed to grow with the sample size n. Numerical simulations suggest that the proposed variable selection procedure outperforms existing methods found in the literature. A real dataset from weather stations in Japan is analyzed.

نظرية الإحصاء نظرية الإحصاء

Some Two-Step Procedures for Variable Selection in High-Dimensional Linear Regression

506 - Jian Zhang , Xinge Jessie Jeng , Han Liu 2008

We study the problem of high-dimensional variable selection via some two-step procedures. First we show that given some good initial estimator which is $ell_{infty}$-consistent but not necessarily variable selection consistent, we can apply the nonne gative Garrote, adaptive Lasso or hard-thresholding procedure to obtain a final estimator that is both estimation and variable selection consistent. Unlike the Lasso, our results do not require the irrepresentable condition which could fail easily even for moderate $p_n$ (Zhao and Yu, 2007) and it also allows $p_n$ to grow almost as fast as $exp(n)$ (for hard-thresholding there is no restriction on $p_n$). We also study the conditions under which the Ridge regression can be used as an initial estimator. We show that under a relaxed identifiable condition, the Ridge estimator is $ell_{infty}$-consistent. Such a condition is usually satisfied when $p_nle n$ and does not require the partial orthogonality between relevant and irrelevant covariates which is needed for the univariate regression in (Huang et al., 2008). Our numerical studies show that when using the Lasso or Ridge as initial estimator, the two-step procedures have a higher sparsity recovery rate than the Lasso or adaptive Lasso with univariate regression used in (Huang et al., 2008).

نظرية الإحصاء نظرية الإحصاء

Selection of variables and dimension reduction in high-dimensional non-parametric regression

395 - Karine Bertin , Guillaume Lecue 2008

We consider a $l_1$-penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension $d$ of the input variable $X$ is very large (sometimes depending on the number of observations). Estimation of a $bet a$-regular regression function $f$ cannot be faster than the slow rate $n^{-2beta/(2beta+d)}$. Hopefully, in some situations, $f$ depends only on a few numbers of the coordinates of $X$. In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate $n^{-2beta/(2beta+d^*)}$, where $d^*$, the real dimension of the problem (exact number of variables whom $f$ depends on), has replaced the dimension $d$ of the design. To achieve this result, we used a $l_1$ penalization method in this non-parametric setup.

نظرية الإحصاء نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة إيبلا الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Choosing a penalty for model selection in heteroscedastic regression

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً