ترغب بنشر مسار تعليمي؟ اضغط هنا

High-dimensional genome-wide association study and misspecified mixed model analysis

176   0   0.0 ( 0 )
 نشر من قبل Debashis Paul
 تاريخ النشر 2014
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We study behavior of the restricted maximum likelihood (REML) estimator under a misspecified linear mixed model (LMM) that has received much attention in recent gnome-wide association studies. The asymptotic analysis establishes consistency of the REML estimator of the variance of the errors in the LMM, and convergence in probability of the REML estimator of the variance of the random effects in the LMM to a certain limit, which is equal to the true variance of the random effects multiplied by the limiting proportion of the nonzero random effects present in the LMM. The aymptotic results also establish convergence rate (in probability) of the REML estimators as well as a result regarding convergence of the asymptotic conditional variance of the REML estimator. The asymptotic results are fully supported by the results of empirical studies, which include extensive simulation studies that compare the performance of the REML estimator (under the misspecified LMM) with other existing methods.



قيم البحث

اقرأ أيضاً

We study variance estimation and associated confidence intervals for parameters characterizing genetic effects from genome-wide association studies (GWAS) misspecified mixed model analysis. Previous studies have shown that, in spite of the model miss pecification, certain quantities of genetic interests are estimable, and consistent estimators of these quantities can be obtained using the restricted maximum likelihood (REML) method under a misspecified linear mixed model. However, the asymptotic variance of such a REML estimator is complicated and not ready to be implemented for practical use. In this paper, we develop practical and computationally convenient methods for estimating such asymptotic variances and constructing the associated confidence intervals. Performance of the proposed methods is evaluated empirically based on Monte-Carlo simulations and real-data application.
We consider the sparse principal component analysis for high-dimensional stationary processes. The standard principal component analysis performs poorly when the dimension of the process is large. We establish the oracle inequalities for penalized pr incipal component estimators for the processes including heavy-tailed time series. The rate of convergence of the estimators is established. We also elucidate the theoretical rate for choosing the tuning parameter in penalized estimators. The performance of the sparse principal component analysis is demonstrated by numerical simulations. The utility of the sparse principal component analysis for time series data is exemplified by the application to average temperature data.
125 - Xin Gao , Grace Y. Yi 2012
This paper investigates the property of the penalized estimating equations when both the mean and association structures are modelled. To select variables for the mean and association structures sequentially, we propose a hierarchical penalized gener alized estimating equations (HPGEE2) approach. The first set of penalized estimating equations is solved for the selection of significant mean parameters. Conditional on the selected mean model, the second set of penalized estimating equations is solved for the selection of significant association parameters. The hierarchical approach is designed to accommodate possible model constraints relating the inclusion of covariates into the mean and the association models. This two-step penalization strategy enjoys a compelling advantage of easing computational burdens compared to solving the two sets of penalized equations simultaneously. HPGEE2 with a smoothly clipped absolute deviation (SCAD) penalty is shown to have the oracle property for the mean and association models. The asymptotic behavior of the penalized estimator under this hierarchical approach is established. An efficient two-stage penalized weighted least square algorithm is developed to implement the proposed method. The empirical performance of the proposed HPGEE2 is demonstrated through Monte-Carlo studies and the analysis of a clinical data set.
Covariance matrix testing for high dimensional data is a fundamental problem. A large class of covariance test statistics based on certain averaged spectral statistics of the sample covariance matrix are known to obey central limit theorems under the null. However, precise understanding for the power behavior of the corresponding tests under general alternatives remains largely unknown. This paper develops a general method for analyzing the power behavior of covariance test statistics via accurate non-asymptotic power expansions. We specialize our general method to two prototypical settings of testing identity and sphericity, and derive sharp power expansion for a number of widely used tests, including the likelihood ratio tests, Ledoit-Nagao-Wolfs test, Cai-Mas test and Johns test. The power expansion for each of those tests holds uniformly over all possible alternatives under mild growth conditions on the dimension-to-sample ratio. Interestingly, although some of those tests are previously known to share the same limiting power behavior under spiked covariance alternatives with a fixed number of spikes, our new power characterizations indicate that such equivalence fails when many spikes exist. The proofs of our results combine techniques from Poincare-type inequalities, random matrices and zonal polynomials.
131 - Yinqiu He , Zi Wang , 2020
The likelihood ratio test is widely used in exploratory factor analysis to assess the model fit and determine the number of latent factors. Despite its popularity and clear statistical rationale, researchers have found that when the dimension of the response data is large compared to the sample size, the classical chi-square approximation of the likelihood ratio test statistic often fails. Theoretically, it has been an open problem when such a phenomenon happens as the dimension of data increases; practically, the effect of high dimensionality is less examined in exploratory factor analysis, and there lacks a clear statistical guideline on the validity of the conventional chi-square approximation. To address this problem, we investigate the failure of the chi-square approximation of the likelihood ratio test in high-dimensional exploratory factor analysis, and derive the necessary and sufficient condition to ensure the validity of the chi-square approximation. The results yield simple quantitative guidelines to check in practice and would also provide useful statistical insights into the practice of exploratory factor analysis.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا