ترغب بنشر مسار تعليمي؟ اضغط هنا

Local Two-Sample Testing over Graphs and Point-Clouds by Random-Walk Distributions

84   0   0.0 ( 0 )
 نشر من قبل Boris Landa
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how the two distributions differ. Given samples from two densities $f_1$ and $f_0$, we consider the task of localizing occurrences of the inequality $f_1 > f_0$. To avoid the challenges associated with high-dimensional space, we propose a general hypothesis testing framework where hypotheses are formulated adaptively to the data by conditioning on the combined sample from the two densities. We then investigate a special case of this framework where the notion of locality is captured by a random walk on a weighted graph constructed over this combined sample. We derive a tractable testing procedure for this case employing a type of scan statistic, and provide non-asymptotic lower bounds on the power and accuracy of our test to detect whether $f_1>f_0$ in a local sense. Furthermore, we characterize the tests consistency according to a certain problem-hardness parameter, and show that our test achieves the minimax detection rate for this parameter. We conduct numerical experiments to validate our method, and demonstrate our approach on two real-world applications: detecting and localizing arsenic well contamination across the United States, and analyzing two-sample single-cell RNA sequencing data from melanoma patients.



قيم البحث

اقرأ أيضاً

This paper considers a family of distributions constructed by a stochastic mixture of the order statistics of a sample of size two. Various properties of the proposed model are studied. We apply the model to extend the exponential and symmetric Lapla ce distributions. An extension to the bivariate case is considered.
173 - Xueying Tang , Ke Li , Malay Ghosh 2015
This paper considers Bayesian multiple testing under sparsity for polynomial-tailed distributions satisfying a monotone likelihood ratio property. Included in this class of distributions are the Students t, the Pareto, and many other distributions. W e prove some general asymptotic optimality results under fixed and random thresholding. As examples of these general results, we establish the Bayesian asymptotic optimality of several multiple testing procedures in the literature for appropriately chosen false discovery rate levels. We also show by simulation that the Benjamini-Hochberg procedure with a false discovery rate level different from the asymptotically optimal one can lead to high Bayes risk.
A scoring rule is a loss function measuring the quality of a quoted probability distribution $Q$ for a random variable $X$, in the light of the realized outcome $x$ of $X$; it is proper if the expected score, under any distribution $P$ for $X$, is mi nimized by quoting $Q=P$. Using the fact that any differentiable proper scoring rule on a finite sample space ${mathcal{X}}$ is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of $x$. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space ${mathcal{X}}$. A useful property of such rules is that the quoted distribution $Q$ need only be known up to a scale factor. Examples of the use of such scoring rules include Besags pseudo-likelihood and Hyv{a}rinens method of ratio matching.
Distance correlation is a new measure of dependence between random vectors. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but unlike the classical definition of correlation, distance correlat ion is zero only if the random vectors are independent. The empirical distance dependence measures are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Asymptotic properties and applications in testing independence are discussed. Implementation of the test and Monte Carlo results are also presented.
We discuss a general approach to handling multiple hypotheses testing in the case when a particular hypothesis states that the vector of parameters identifying the distribution of observations belongs to a convex compact set associated with the hypot hesis. With our approach, this problem reduces to testing the hypotheses pairwise. Our central result is a test for a pair of hypotheses of the outlined type which, under appropriate assumptions, is provably nearly optimal. The test is yielded by a solution to a convex programming problem, so that our construction admits computationally efficient implementation. We further demonstrate that our assumptions are satisfied in several important and interesting applications. Finally, we show how our approach can be applied to a rather general detection problem encompassing several classical statistical settings such as detection of abrupt signal changes, cusp detection and multi-sensor detection.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا