ترغب بنشر مسار تعليمي؟ اضغط هنا

Testing and estimation of clustered signals

84   0   0.0 ( 0 )
 نشر من قبل Hongyuan Cao Prof
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a change-point detection method for large scale multiple testing problems with data having clustered signals. Unlike the classic change-point setup, the signals can vary in size within a cluster. The clustering structure on the signals enables us to effectively delineate the boundaries between signal and non-signal segments. New test statistics are proposed for observations from one and/or multiple realizations. Their asymptotic distributions are derived. We also study the associated variance estimation problem. We allow the variances to be heteroscedastic in the multiple realization case, which substantially expands the applicability of the proposed method. Simulation studies demonstrate that the proposed approach has a favorable performance. Our procedure is applied to {an array based Comparative Genomic Hybridization (aCGH)} dataset.



قيم البحث

اقرأ أيضاً

The practice of pooling several individual test statistics to form aggregate tests is common in many statistical application where individual tests may be underpowered. While selection by aggregate tests can serve to increase power, the selection pro cess invalidates the individual test-statistics, making it difficult to identify the ones that drive the signal in follow-up inference. Here, we develop a general approach for valid inference following selection by aggregate testing. We present novel powerful post-selection tests for the individual null hypotheses which are exact for the normal model and asymptotically justified otherwise. Our approach relies on the ability to characterize the distribution of the individual test statistics after conditioning on the event of selection. We provide efficient algorithms for estimation of the post-selection maximum-likelihood estimates and suggest confidence intervals which rely on a novel switching regime for good coverage guarantees. We validate our methods via comprehensive simulation studies and apply them to data from the Dallas Heart Study, demonstrating that single variant association discovery following selection by an aggregated test is indeed possible in practice.
Spatial regression or geographically weighted regression models have been widely adopted to capture the effects of auxiliary information on a response variable of interest over a region. In contrast, relationships between response and auxiliary varia bles are expected to exhibit complex spatial patterns in many applications. This paper proposes a new approach for spatial regression, called spatially clustered regression, to estimate possibly clustered spatial patterns of the relationships. We combine K-means-based clustering formulation and penalty function motivated from a spatial process known as Potts model for encouraging similar clustering in neighboring locations. We provide a simple iterative algorithm to fit the proposed method, scalable for large spatial datasets. Through simulation studies, the proposed method demonstrates its superior performance to existing methods even under the true structure does not admit spatial clustering. Finally, the proposed method is applied to crime event data in Tokyo and produces interpretable results for spatial patterns. The R code is available at https://github.com/sshonosuke/SCR.
108 - Junhui Cai , Xu Han , Yaacov Ritov 2021
Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack, with accuracy and false discovery control. However, the unprecedente d complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adjust for both sparsity and heterogeneity. In addition, estimates for high-dimensional parameters often lack uncertainty quantification. In this paper, we propose a novel Spike-and-Nonparametric mixture prior (SNP) -- a spike to promote the sparsity and a nonparametric structure to capture signals. In contrast to the state-of-the-art methods, the proposed methods solve the estimation and testing problem at once with several merits: 1) an accurate sparsity estimation; 2) point estimates with shrinkage/soft-thresholding property; 3) credible intervals for uncertainty quantification; 4) an optimal multiple testing procedure that controls false discovery rate. Our method exhibits promising empirical performance on both simulated data and a gene expression case study.
In multiple testing, the family-wise error rate can be bounded under some conditions by the copula of the test statistics. Assuming that this copula is Archimedean, we consider two non-parametric Archimedean generator estimators. More specifically, w e use the non-parametric estimator from Genest et al. (2011) and a slight modification thereof. In simulations, we compare the resulting multiple tests with the Bonferroni test and the multiple test derived from the true generator as baselines.
130 - Xiao Fang , David Siegmund 2020
We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes serious loss of power. Examples involving temperature variations, the level of atmospheric greenhouse gases, suicide rates and daily incidence of COVID-19 illustrate the general theory.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا