ترغب بنشر مسار تعليمي؟ اضغط هنا

Likelihood Ratio Test in Multivariate Linear Regression: from Low to High Dimension

90   0   0.0 ( 0 )
 نشر من قبل Yinqiu He
 تاريخ النشر 2018
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer such associations, it is often of interest to test the structure of the regression coefficients matrix, and the likelihood ratio test (LRT) is one of the most popular approaches in practice. Despite its popularity, it is known that the classical $chi^2$ approximations for LRTs often fail in high-dimensional settings, where the dimensions of responses and predictors $(m,p)$ are allowed to grow with the sample size $n$. Though various corrected LRTs and other test statistics have been proposed in the literature, the fundamental question of when the classic LRT starts to fail is less studied, an answer to which would provide insights for practitioners, especially when analyzing data with $m/n$ and $p/n$ small but not negligible. Moreover, the power performance of the LRT in high-dimensional data analysis remains underexplored. To address these issues, the first part of this work gives the asymptotic boundary where the classical LRT fails and develops the corrected limiting distribution of the LRT for a general asymptotic regime. The second part of this work further studies the test power of the LRT in the high-dimensional setting. The result not only advances the current understanding of asymptotic behavior of the LRT under alternative hypothesis, but also motivates the development of a power-enhanced LRT. The third part of this work considers the setting with $p>n$, where the LRT is not well-defined. We propose a two-step testing procedure by first performing dimension reduction and then applying the proposed LRT. Theoretical properties are developed to ensure the validity of the proposed method. Numerical studies are also presented to demonstrate its good performance.



قيم البحث

اقرأ أيضاً

131 - Yinqiu He , Zi Wang , 2020
The likelihood ratio test is widely used in exploratory factor analysis to assess the model fit and determine the number of latent factors. Despite its popularity and clear statistical rationale, researchers have found that when the dimension of the response data is large compared to the sample size, the classical chi-square approximation of the likelihood ratio test statistic often fails. Theoretically, it has been an open problem when such a phenomenon happens as the dimension of data increases; practically, the effect of high dimensionality is less examined in exploratory factor analysis, and there lacks a clear statistical guideline on the validity of the conventional chi-square approximation. To address this problem, we investigate the failure of the chi-square approximation of the likelihood ratio test in high-dimensional exploratory factor analysis, and derive the necessary and sufficient condition to ensure the validity of the chi-square approximation. The results yield simple quantitative guidelines to check in practice and would also provide useful statistical insights into the practice of exploratory factor analysis.
In this study, we propose shrinkage methods based on {it generalized ridge regression} (GRR) estimation which is suitable for both multicollinearity and high dimensional problems with small number of samples (large $p$, small $n$). Also, it is obtain ed theoretical properties of the proposed estimators for Low/High Dimensional cases. Furthermore, the performance of the listed estimators is demonstrated by both simulation studies and real-data analysis, and compare its performance with existing penalty methods. We show that the proposed methods compare well to competing regularization techniques.
Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2Lambda$ is distributed as a $chi^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/nrightarrowkappa$ for some constant $kappa < 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2Lambda~stackrel{mathrm{d}}{rightarrow}~alpha(kappa)chi_k^2$, where the scaling factor $alpha(kappa)$ is greater than one as soon as the dimensionality ratio $kappa$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $kappa=0.3$, $alpha(kappa)approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
261 - Jeremie Kellner 2014
A new goodness-of-fit test for normality in high-dimension (and Reproducing Kernel Hilbert Space) is proposed. It shares common ideas with the Maximum Mean Discrepancy (MMD) it outperforms both in terms of computation time and applicability to a wide r range of data. Theoretical results are derived for the Type-I and Type-II errors. They guarantee the control of Type-I error at prescribed level and an exponentially fast decrease of the Type-II error. Synthetic and real data also illustrate the practical improvement allowed by our test compared with other leading approaches in high-dimensional settings.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا