No Arabic abstract
In this study, we propose shrinkage methods based on {it generalized ridge regression} (GRR) estimation which is suitable for both multicollinearity and high dimensional problems with small number of samples (large $p$, small $n$). Also, it is obtained theoretical properties of the proposed estimators for Low/High Dimensional cases. Furthermore, the performance of the listed estimators is demonstrated by both simulation studies and real-data analysis, and compare its performance with existing penalty methods. We show that the proposed methods compare well to competing regularization techniques.
We study asymptotic minimax problems for estimating a $d$-dimensional regression parameter over spheres of growing dimension ($dto infty$). Assuming that the data follows a linear model with Gaussian predictors and errors, we show that ridge regression is asymptotically minimax and derive new closed form expressions for its asymptotic risk under squared-error loss. The asymptotic risk of ridge regression is closely related to the Stieltjes transform of the Marv{c}enko-Pastur distribution and the spectral distribution of the predictors from the linear model. Adaptive ridge estimators are also proposed (which adapt to the unknown radius of the sphere) and connections with equivariant estimation are highlighted. Our results are mostly relevant for asymptotic settings where the number of observations, $n$, is proportional to the number of predictors, that is, $d/ntorhoin(0,infty)$.
Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer such associations, it is often of interest to test the structure of the regression coefficients matrix, and the likelihood ratio test (LRT) is one of the most popular approaches in practice. Despite its popularity, it is known that the classical $chi^2$ approximations for LRTs often fail in high-dimensional settings, where the dimensions of responses and predictors $(m,p)$ are allowed to grow with the sample size $n$. Though various corrected LRTs and other test statistics have been proposed in the literature, the fundamental question of when the classic LRT starts to fail is less studied, an answer to which would provide insights for practitioners, especially when analyzing data with $m/n$ and $p/n$ small but not negligible. Moreover, the power performance of the LRT in high-dimensional data analysis remains underexplored. To address these issues, the first part of this work gives the asymptotic boundary where the classical LRT fails and develops the corrected limiting distribution of the LRT for a general asymptotic regime. The second part of this work further studies the test power of the LRT in the high-dimensional setting. The result not only advances the current understanding of asymptotic behavior of the LRT under alternative hypothesis, but also motivates the development of a power-enhanced LRT. The third part of this work considers the setting with $p>n$, where the LRT is not well-defined. We propose a two-step testing procedure by first performing dimension reduction and then applying the proposed LRT. Theoretical properties are developed to ensure the validity of the proposed method. Numerical studies are also presented to demonstrate its good performance.
We propose statistical inferential procedures for panel data models with interactive fixed effects in a kernel ridge regression framework.Compared with traditional sieve methods, our method is automatic in the sense that it does not require the choice of basis functions and truncation parameters.Model complexity is controlled by a continuous regularization parameter which can be automatically selected by generalized cross validation. Based on empirical processes theory and functional analysis tools, we derive joint asymptotic distributions for the estimators in the heterogeneous setting. These joint asymptotic results are then used to construct confidence intervals for the regression means and prediction intervals for the future observations, both being the first provably valid intervals in literature. Marginal asymptotic normality of the functional estimators in homogeneous setting is also obtained. Simulation and real data analysis demonstrate the advantages of our method.
We consider a re-sampling scheme for estimation of the population parameters in the mixed effects nonlinear regression models of the type use for example in clinical pharmacokinetics, say. We provide an estimation procedure which {it recycles}, via random weighting, the relevant two-stage parameters estimates to construct consistent estimates of the sampling distribution of the various estimates. We establish the asymptotic consistency and asymptotic normality of the resampled estimates and demonstrate the applicability of the {it recycling} approach in a small simulation study and via example.
In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the large $p$ small $n$ setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.