ﻻ يوجد ملخص باللغة العربية
Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2Lambda$ is distributed as a $chi^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/nrightarrowkappa$ for some constant $kappa < 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2Lambda~stackrel{mathrm{d}}{rightarrow}~alpha(kappa)chi_k^2$, where the scaling factor $alpha(kappa)$ is greater than one as soon as the dimensionality ratio $kappa$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $kappa=0.3$, $alpha(kappa)approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
Multivariate linear regressions are widely used statistical tools in many applications to model the associations between multiple related responses and a set of predictors. To infer such associations, it is often of interest to test the structure of
The likelihood ratio test is widely used in exploratory factor analysis to assess the model fit and determine the number of latent factors. Despite its popularity and clear statistical rationale, researchers have found that when the dimension of the
We introduce a new test procedure of independence in the framework of parametric copulas with unknown marginals. The method is based essentially on the dual representation of $chi^2$-divergence on signed finite measures. The asymptotic properties of
The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance samp
In this paper, we develop modifi