ترغب بنشر مسار تعليمي؟ اضغط هنا

The large sample coverage probability of confidence intervals in general regression models after a preliminary hypothesis test

139   0   0.0 ( 0 )
 نشر من قبل Paul Kabaila
 تاريخ النشر 2017
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We derive a computationally convenient formula for the large sample coverage probability of a confidence interval for a scalar parameter of interest following a preliminary hypothesis test that a specified vector parameter takes a given value in a general regression model. Previously, this large sample coverage probability could only be estimated by simulation. Our formula only requires the evaluation, by numerical integration, of either a double or triple integral, irrespective of the dimension of this specified vector parameter. We illustrate the application of this formula to a confidence interval for the log odds ratio of myocardial infarction when the exposure is recent oral contraceptive use, following a preliminary test that two specified interactions in a logistic regression model are zero. For this real-life data, we compare this large sample coverage probability with the actual coverage probability of this confidence interval, obtained by simulation.

قيم البحث

اقرأ أيضاً

We consider a linear regression model, with the parameter of interest a specified linear combination of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or minimizin g AIC) is used to select a model. It is common statistical practice to then construct a confidence interval for the parameter of interest based on the assumption that the selected model had been given to us a priori. This assumption is false and it can lead to a confidence interval with poor coverage properties. We provide an easily-computed finite sample upper bound (calculated by repeated numerical evaluation of a double integral) to the minimum coverage probability of this confidence interval. This bound applies for model selection by any of the following methods: minimum AIC, minimum BIC, maximum adjusted R-squared, minimum Mallows Cp and t-tests. The importance of this upper bound is that it delineates general categories of design matrices and model selection procedures for which this confidence interval has poor coverage properties. This upper bound is shown to be a finite sample analogue of an earlier large sample upper bound due to Kabaila and Leeb.
We consider a general regression model, without a scale parameter. Our aim is to construct a confidence interval for a scalar parameter of interest $theta$ that utilizes the uncertain prior information that a distinct scalar parameter $tau$ takes the specified value $t$. This confidence interval should have good coverage properties. It should also have scaled expected length, where the scaling is with respect to the usual confidence interval, that (a) is substantially less than 1 when the prior information is correct, (b) has a maximum value that is not too large and (c) is close to 1 when the data and prior information are highly discordant. The asymptotic joint distribution of the maximum likelihood estimators $theta$ and $tau$ is similar to the joint distributions of these estimators in the particular case of a linear regression with normally distributed errors having known variance. This similarity is used to construct a confidence interval with the desired properties by using the confidence interval, computed using the R package ciuupi, that utilizes the uncertain prior information in this particular linear regression case. An important practical application of this confidence interval is to a quantal bioassay carried out to compare two similar compounds. In this context, the uncertain prior information is that the hypothesis of parallelism holds. We provide extensive numerical results that illustrate the properties of this confidence interval in this context.
We consider a linear regression model with regression parameter beta=(beta_1,...,beta_p) and independent and identically N(0,sigma^2) distributed errors. Suppose that the parameter of interest is theta = a^T beta where a is a specified vector. Define the parameter tau=c^T beta-t where the vector c and the number t are specified and a and c are linearly independent. Also suppose that we have uncertain prior information that tau = 0. We present a new frequentist 1-alpha confidence interval for theta that utilizes this prior information. We require this confidence interval to (a) have endpoints that are continuous functions of the data and (b) coincide with the standard 1-alpha confidence interval when the data strongly contradicts this prior information. This interval is optimal in the sense that it has minimum weighted average expected length where the largest weight is given to this expected length when tau=0. This minimization leads to an interval that has the following desirable properties. This interval has expected length that (a) is relatively small when the prior information about tau is correct and (b) has a maximum value that is not too large. The following problem will be used to illustrate the application of this new confidence interval. Consider a 2-by 2 factorial experiment with 20 replicates. Suppose that the parameter of interest theta is a specified simple effect and that we have uncertain prior information that the two-factor interaction is zero. Our aim is to find a frequentist 0.95 confidence interval for theta that utilizes this prior information.
Consider a linear regression model and suppose that our aim is to find a confidence interval for a specified linear combination of the regression parameters. In practice, it is common to perform a Durbin-Watson pretest of the null hypothesis of zero first-order autocorrelation of the random errors against the alternative hypothesis of positive first-order autocorrelation. If this null hypothesis is accepted then the confidence interval centred on the Ordinary Least Squares estimator is used; otherwise the confidence interval centred on the Feasible Generalized Least Squares estimator is used. We provide new tools for the computation, for any given design matrix and parameter of interest, of graphs of the coverage probability functions of the confidence interval resulting from this two-stage procedure and the confidence interval that is always centred on the Feasible Generalized Least Squares estimator. These graphs are used to choose the better confidence interval, prior to any examination of the observed response vector.
We compare the following two sources of poor coverage of post-model-selection confidence intervals: the preliminary data-based model selection sometimes chooses the wrong model and the data used to choose the model is re-used for the construction of the confidence interval.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا