ﻻ يوجد ملخص باللغة العربية
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness.
Two-sample and independence tests with the kernel-based MMD and HSIC have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to non-stationary random processes, a prevalent f
We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates.
We propose a novel approach to the analysis of covariance operators making use of concentration inequalities. First, non-asymptotic confidence sets are constructed for such operators. Then, subsequent applications including a k sample test for equali
A new family of nonparametric statistics, the r-statistics, is introduced. It consists of counting the number of records of the cumulative sum of the sample. The single-sample r-statistic is almost as powerful as Students t-statistic for Gaussian and
Global null testing is a classical problem going back about a century to Fishers and Stouffers combination tests. In this work, we present simple martingale analogs of these classical tests, which are applicable in two distinct settings: (a) the onli