No Arabic abstract
In the fields of clinical trials, biomedical surveys, marketing, banking, with dichotomous response variable, the logistic regression is considered as an alternative convenient approach to linear regression. In this paper, we develop a novel bootstrap technique based on perturbation resampling method for approximating the distribution of the maximum likelihood estimator (MLE) of the regression parameter vector. We establish second order correctness of the proposed bootstrap method after proper studentization and smoothing. It is shown that inferences drawn based on the proposed bootstrap method are more accurate compared to that based on asymptotic normality. The main challenge in establishing second order correctness remains in the fact that the response variable being binary, the resulting MLE has a lattice structure. We show the direct bootstrapping approach fails even after studentization. We adopt smoothing technique developed in Lahiri (1993) to ensure that the smoothed studentized version of the MLE has a density. Similar smoothing strategy is employed to the bootstrap version also to achieve second order correct approximation.
Selection of important covariates and to drop the unimportant ones from a high-dimensional regression model is a long standing problem and hence have received lots of attention in the last two decades. After selecting the correct model, it is also important to properly estimate the existing parameters corresponding to important covariates. In this spirit, Fan and Li (2001) proposed Oracle property as a desired feature of a variable selection method. Oracle property has two parts; one is the variable selection consistency (VSC) and the other one is the asymptotic normality. Keeping VSC fixed and making the other part stronger, Fan and Lv (2008) introduced the strong oracle property. In this paper, we consider different penalized regression techniques which are VSC and classify those based on oracle and strong oracle property. We show that both the residual and the perturbation bootstrap methods are second order correct for any penalized estimator irrespective of its class. Most interesting of all is the Lasso, introduced by Tibshirani (1996). Although Lasso is VSC, it is not asymptotically normal and hence fails to satisfy the oracle property.
In this paper, we develop uniform inference methods for the conditional mode based on quantile regression. Specifically, we propose to estimate the conditional mode by minimizing the derivative of the estimated conditional quantile function defined by smoothing the linear quantile regression estimator, and develop two bootstrap methods, a novel pivotal bootstrap and the nonparametric bootstrap, for our conditional mode estimator. Building on high-dimensional Gaussian approximation techniques, we establish the validity of simultaneous confidence rectangles constructed from the two bootstrap methods for the conditional mode. We also extend the preceding analysis to the case where the dimension of the covariate vector is increasing with the sample size. Finally, we conduct simulation experiments and a real data analysis using U.S. wage data to demonstrate the finite sample performance of our inference method.
In this paper we consider exact tests of a multiple logistic regression, where the levels of covariates are equally spaced, via Markov beses. In usual application of multiple logistic regression, the sample size is positive for each combination of levels of the covariates. In this case we do not need a whole Markov basis, which guarantees connectivity of all fibers. We first give an explicit Markov basis for multiple Poisson regression. By the Lawrence lifting of this basis, in the case of bivariate logistic regression, we show a simple subset of the Markov basis which connects all fibers with a positive sample size for each combination of levels of covariates.
Recently, Kabaila and Wijethunga assessed the performance of a confidence interval centred on a bootstrap smoothed estimator, with width proportional to an estimator of Efrons delta method approximation to the standard deviation of this estimator. They used a testbed situation consisting of two nested linear regression models, with error variance assumed known, and model selection using a preliminary hypothesis test. This assessment was in terms of coverage and scaled expected length, where the scaling is with respect to the expected length of the usual confidence interval with the same minimum coverage probability. They found that this confidence interval has scaled expected length that (a) has a maximum value that may be much greater than 1 and (b) is greater than a number slightly less than 1 when the simpler model is correct. We therefore ask the following question. For a confidence interval, centred on the bootstrap smoothed estimator, does there exist a formula for its data-based width such that, in this testbed situation, it has the desired minimum coverage and scaled expected length that (a) has a maximum value that is not too much larger than 1 and (b) is substantially less than 1 when the simpler model is correct? Using a recent decision-theoretic performance bound due to Kabaila and Kong, it is shown that the answer to this question is `no for a wide range of scenarios.
High-dimensional linear regression has been intensively studied in the community of statistics in the last two decades. For the convenience of theoretical analyses, classical methods usually assume independent observations and sub-Gaussian-tailed errors. However, neither of them hold in many real high-dimensional time-series data. Recently [Sun, Zhou, Fan, 2019, J. Amer. Stat. Assoc., in press] proposed Adaptive Huber Regression (AHR) to address the issue of heavy-tailed errors. They discover that the robustification parameter of the Huber loss should adapt to the sample size, the dimensionality, and the moments of the heavy-tailed errors. We progress in a vertical direction and justify AHR on dependent observations. Specifically, we consider an important dependence structure -- Markov dependence. Our results show that the Markov dependence impacts on the adaption of the robustification parameter and the estimation of regression coefficients in the way that the sample size should be discounted by a factor depending on the spectral gap of the underlying Markov chain.