Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Two sources of poor coverage of confidence intervals after model selection

240 0 0.0 ( 0 )

Download Cite

Added by Paul Kabaila

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Paul Kabaila - Rheanna Mainzer

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We compare the following two sources of poor coverage of post-model-selection confidence intervals: the preliminary data-based model selection sometimes chooses the wrong model and the data used to choose the model is re-used for the construction of the confidence interval.

rate research

Upper bounds on the minimum coverage probability of confidence intervals in regression after variable selection

661 - Paul Kabaila , Khageswor Giri 2007

We consider a linear regression model, with the parameter of interest a specified linear combination of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or minimizing AIC) is used to select a model. It is common statistical practice to then construct a confidence interval for the parameter of interest based on the assumption that the selected model had been given to us a priori. This assumption is false and it can lead to a confidence interval with poor coverage properties. We provide an easily-computed finite sample upper bound (calculated by repeated numerical evaluation of a double integral) to the minimum coverage probability of this confidence interval. This bound applies for model selection by any of the following methods: minimum AIC, minimum BIC, maximum adjusted R-squared, minimum Mallows Cp and t-tests. The importance of this upper bound is that it delineates general categories of design matrices and model selection procedures for which this confidence interval has poor coverage properties. This upper bound is shown to be a finite sample analogue of an earlier large sample upper bound due to Kabaila and Leeb.

Statistics Theory Applications Statistics Theory

Confidence intervals centred on bootstrap smoothed estimators: an impossibility result

87 - Paul Kabaila , Christeen Wijethunga 2019

Recently, Kabaila and Wijethunga assessed the performance of a confidence interval centred on a bootstrap smoothed estimator, with width proportional to an estimator of Efrons delta method approximation to the standard deviation of this estimator. They used a testbed situation consisting of two nested linear regression models, with error variance assumed known, and model selection using a preliminary hypothesis test. This assessment was in terms of coverage and scaled expected length, where the scaling is with respect to the expected length of the usual confidence interval with the same minimum coverage probability. They found that this confidence interval has scaled expected length that (a) has a maximum value that may be much greater than 1 and (b) is greater than a number slightly less than 1 when the simpler model is correct. We therefore ask the following question. For a confidence interval, centred on the bootstrap smoothed estimator, does there exist a formula for its data-based width such that, in this testbed situation, it has the desired minimum coverage and scaled expected length that (a) has a maximum value that is not too much larger than 1 and (b) is substantially less than 1 when the simpler model is correct? Using a recent decision-theoretic performance bound due to Kabaila and Kong, it is shown that the answer to this question is `no for a wide range of scenarios.

Statistics Theory Methodology Statistics Theory

Asymptotic coverage probabilities of bootstrap percentile confidence intervals for constrained parameters

102 - Chunlin Wang , Paul Marriott , Pengfei Li 2017

The asymptotic behaviour of the commonly used bootstrap percentile confidence interval is investigated when the parameters are subject to linear inequality constraints. We concentrate on the important one- and two-sample problems with data generated from general parametric distributions in the natural exponential family. The focus of this paper is on quantifying the coverage probabilities of the parametric bootstrap percentile confidence intervals, in particular their limiting behaviour near boundaries. We propose a local asymptotic framework to study this subtle coverage behaviour. Under this framework, we discover that when the true parameters are on, or close to, the restriction boundary, the asymptotic coverage probabilities can always exceed the nominal level in the one-sample case; however, they can be, remarkably, both under and over the nominal level in the two-sample case. Using illustrative examples, we show that the results provide theoretical justification and guidance on applying the bootstrap percentile method to constrained inference problems.

Statistics Theory Computation Methodology

Adaptive Confidence Sets for the Optimal Approximating Model

413 - Angelika Rohde , Lutz Duembgen 2009

In the setting of high-dimensional linear models with Gaussian noise, we investigate the possibility of confidence statements connected to model selection. Although there exist numerous procedures for adaptive point estimation, the construction of adaptive confidence regions is severely limited (cf. Li, 1989). The present paper sheds new light on this gap. We develop exact and adaptive confidence sets for the best approximating model in terms of risk. One of our constructions is based on a multiscale procedure and a particular coupling argument. Utilizing exponential inequalities for noncentral chi-squared distributions, we show that the risk and quadratic loss of all models within our confidence region are uniformly bounded by the minimal risk times a factor close to one.

Statistics Theory Methodology Statistics Theory

Confidence bands for a log-concave density

392 - Guenther Walther , Alnur Ali , Xinyue Shen 2020

We present a new approach for inference about a log-concave distribution: Instead of using the method of maximum likelihood, we propose to incorporate the log-concavity constraint in an appropriate nonparametric confidence set for the cdf $F$. This approach has the advantage that it automatically provides a measure of statistical uncertainty and it thus overcomes a marked limitation of the maximum likelihood estimate. In particular, we show how to construct confidence bands for the density that have a finite sample guaranteed confidence level. The nonparametric confidence set for $F$ which we introduce here has attractive computational and statistical properties: It allows to bring modern tools from optimization to bear on this problem via difference of convex programming, and it results in optimal statistical inference. We show that the width of the resulting confidence bands converges at nearly the parametric $n^{-frac{1}{2}}$ rate when the log density is $k$-affine.

Statistics Theory Methodology Statistics Theory

comments

Fetching comments

Tartous University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Two sources of poor coverage of confidence intervals after model selection

Ask ChatGPT about the research

No Arabic abstract

We compare the following two sources of poor coverage of post-model-selection confidence intervals: the preliminary data-based model selection sometimes chooses the wrong model and the data used to choose the model is re-used for the construction of the confidence interval.

Read More