ترغب بنشر مسار تعليمي؟ اضغط هنا

Interpretable High-Dimensional Inference Via Score Projection with an Application in Neuroimaging

62   0   0.0 ( 0 )
 نشر من قبل Simon Vandekar
 تاريخ النشر 2017
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In the fields of neuroimaging and genetics, a key goal is testing the association of a single outcome with a very high-dimensional imaging or genetic variable. Often, summary measures of the high-dimensional variable are created to sequentially test and localize the association with the outcome. In some cases, the results for summary measures are significant, but subsequent tests used to localize differences are underpowered and do not identify regions associated with the outcome. Here, we propose a generalization of Raos score test based on projecting the score statistic onto a linear subspace of a high-dimensional parameter space. In addition, we provide methods to localize signal in the high-dimensional space by projecting the scores to the subspace where the score test was performed. This allows for inference in the high-dimensional space to be performed on the same degrees of freedom as the score test, effectively reducing the number of comparisons. Simulation results demonstrate the test has competitive power relative to others commonly used. We illustrate the method by analyzing a subset of the Alzheimers Disease Neuroimaging Initiative dataset. Results suggest cortical thinning of the frontal and temporal lobes may be a useful biological marker of Alzheimers risk.

قيم البحث

اقرأ أيضاً

In this paper we develop an online statistical inference approach for high-dimensional generalized linear models with streaming data for real-time estimation and inference. We propose an online debiased lasso (ODL) method to accommodate the special s tructure of streaming data. ODL differs from offline debiased lasso in two important aspects. First, in computing the estimate at the current stage, it only uses summary statistics of the historical data. Second, in addition to debiasing an online lasso estimator, ODL corrects an approximation error term arising from nonlinear online updating with streaming data. We show that the proposed online debiased estimators for the GLMs are consistent and asymptotically normal. This result provides a theoretical basis for carrying out real-time interim statistical inference with streaming data. Extensive numerical experiments are conducted to evaluate the performance of the proposed ODL method. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. A streaming dataset from the National Automotive Sampling System-Crashworthiness Data System is analyzed to illustrate the application of the proposed method.
In this paper we discuss the estimation of a nonparametric component $f_1$ of a nonparametric additive model $Y=f_1(X_1) + ...+ f_q(X_q) + epsilon$. We allow the number $q$ of additive components to grow to infinity and we make sparsity assumptions a bout the number of nonzero additive components. We compare this estimation problem with that of estimating $f_1$ in the oracle model $Z= f_1(X_1) + epsilon$, for which the additive components $f_2,dots,f_q$ are known. We construct a two-step presmoothing-and-resmoothing estimator of $f_1$ and state finite-sample bounds for the difference between our estimator and some smoothing estimators $hat f_1^{text{(oracle)}}$ in the oracle model. In an asymptotic setting these bounds can be used to show asymptotic equivalence of our estimator and the oracle estimators; the paper thus shows that, asymptotically, under strong enough sparsity conditions, knowledge of $f_2,dots,f_q$ has no effect on estimation accuracy. Our first step is to estimate $f_1$ with an undersmoothed estimator based on near-orthogonal projections with a group Lasso bias correction. We then construct pseudo responses $hat Y$ by evaluating a debiased modification of our undersmoothed estimator of $f_1$ at the design points. In the second step the smoothing method of the oracle estimator $hat f_1^{text{(oracle)}}$ is applied to a nonparametric regression problem with responses $hat Y$ and covariates $X_1$. Our mathematical exposition centers primarily on establishing properties of the presmoothing estimator. We present simulation results demonstrating close-to-oracle performance of our estimator in practical applications.
We introduce a new random matrix model called distance covariance matrix in this paper, whose normalized trace is equivalent to the distance covariance. We first derive a deterministic limit for the eigenvalue distribution of the distance covariance matrix when the dimensions of the vectors and the sample size tend to infinity simultaneously. This limit is valid when the vectors are independent or weakly dependent through a finite-rank perturbation. It is also universal and independent of the details of the distributions of the vectors. Furthermore, the top eigenvalues of this distance covariance matrix are shown to obey an exact phase transition when the dependence of the vectors is of finite rank. This finding enables the construction of a new detector for such weak dependence where classical methods based on large sample covariance matrices or sample canonical correlations may fail in the considered high-dimensional framework.
In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the large $p$ small $n$ setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.
We consider a sparse linear regression model with unknown symmetric error under the high-dimensional setting. The true error distribution is assumed to belong to the locally $beta$-H{o}lder class with an exponentially decreasing tail, which does not need to be sub-Gaussian. We obtain posterior convergence rates of the regression coefficient and the error density, which are nearly optimal and adaptive to the unknown sparsity level. Furthermore, we derive the semi-parametric Bernstein-von Mises (BvM) theorem to characterize asymptotic shape of the marginal posterior for regression coefficients. Under the sub-Gaussianity assumption on the true score function, strong model selection consistency for regression coefficients are also obtained, which eventually asserts the frequentists validity of credible sets.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا