بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

365 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wen-Xin Zhou

تاريخ النشر 2015

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Jianqing Fan - Qi-Man Shao - Wen-Xin Zhou

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable $Y$ with the best $s$ linear combinations of $p$ covariates $mathbf{X}$, even when $mathbf{X}$ and $Y$ are independent. When the covariance matrix of $mathbf{X}$ possesses the restricted eigenvalue property, we derive such distributions for both a finite $s$ and a diverging $s$, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of $mathbf{X}$. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis.

قيم البحث

100 - Weijie J. Su 2017

Applied statisticians use sequential regression procedures to produce a ranking of explanatory variables and, in settings of low correlations between variables and strong true effect sizes, expect that variables at the very top of this ranking are tr uly relevant to the response. In a regime of certain sparsity levels, however, three examples of sequential procedures--forward stepwise, the lasso, and least angle regression--are shown to include the first spurious variable unexpectedly early. We derive a rigorous, sharp prediction of the rank of the first spurious variable for these three procedures, demonstrating that the first spurious variable occurs earlier and earlier as the regression coefficients become denser. This counterintuitive phenomenon persists for statistically independent Gaussian random designs and an arbitrarily large magnitude of the true effects. We gain a better understanding of the phenomenon by identifying the underlying cause and then leverage the insights to introduce a simple visualization tool termed the double-ranking diagram to improve on sequential methods. As a byproduct of these findings, we obtain the first provable result certifying the exact equivalence between the lasso and least angle regression in the early stages of solution paths beyond orthogonal designs. This equivalence can seamlessly carry over many important model selection results concerning the lasso to least angle regression.

نظرية الإحصاء التعلم الالي نظرية الإحصاء

Unions of Orthogonal Arrays and their aberrations via Hilbert bases

184 - Roberto Fontana , Fabio Rapallo 2018

We generate all the Orthogonal Arrays (OAs) of a given size n and strength t as the union of a collection of OAs which belong to an inclusion-minimal set of OAs. We derive a formula for computing the (Generalized) Word Length Pattern of a union of OA s that makes use of their polynomial counting functions. In this way the best OAs according to the Generalized Minimum Aberration criterion can be found by simply exploring a relatively small set of counting functions. The classes of OAs with 5 binary factors, strength 2, and sizes 16 and 20 are fully described.

نظرية الإحصاء المنهجية نظرية الإحصاء

Limiting distributions of graph-based test statistics

153 - Yejiong Zhu , Hao Chen 2021

Two-sample tests utilizing a similarity graph on observations are useful for high-dimensional data and non-Euclidean data due to their flexibility and good performance under a wide range of alternatives. Existing works mainly focused on sparse graphs , such as graphs with the number of edges in the order of the number of observations. However, the tests have better performance with denser graphs under many settings. In this work, we establish the theoretical ground for graph-based tests with graphs that are much denser than those in existing works.

نظرية الإحصاء المنهجية نظرية الإحصاء

Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency

433 - Lutz Duembgen , Kaspar Rufibach 2009

We study nonparametric maximum likelihood estimation of a log-concave probability density and its distribution and hazard function. Some general properties of these estimators are derived from two characterizations. It is shown that the rate of conve rgence with respect to supremum norm on a compact interval for the density and hazard rate estimator is at least $(log(n)/n)^{1/3}$ and typically $(log(n)/n)^{2/5}$, whereas the difference between the empirical and estimated distribution function vanishes with rate $o_{mathrm{p}}(n^{-1/2})$ under certain regularity assumptions.

نظرية الإحصاء المنهجية نظرية الإحصاء

Tests of exponentiality based on Arnold-Villasenor characterization, and their efficiencies

347 - M. Jovanovic , B. Milosevic , Ya. Yu. Nikitin 2014

We propose two families of scale-free exponentiality tests based on the recent characterization of exponentiality by Arnold and Villasenor. The test statistics are based on suitable functionals of U-empirical distribution functions. The family of int egral statistics can be reduced to V- or U-statistics with relatively simple non-degenerate kernels. They are asymptotically normal and have reasonably high local Bahadur efficiency under common alternatives. This efficiency is compared with simulated powers of new tests. On the other hand, the Kolmogorov type tests demonstrate very low local Bahadur efficiency and rather moderate power for common alternatives,and can hardly be recommended to practitioners. We also explore the conditions of local asymptotic optimality of new tests and describe for both families special most favorable alternatives for which the tests are fully efficient.

نظرية الإحصاء المنهجية نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلوان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً