Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A hypothesis-testing perspective on the G-normal distribution theory

234 0 0.0 ( 0 )

Download Cite

Added by Quan Zhou

Publication date 2019

fields

and research's language is English

Authors Shige Peng - Quan Zhou

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The G-normal distribution was introduced by Peng [2007] as the limiting distribution in the central limit theorem for sublinear expectation spaces. Equivalently, it can be interpreted as the solution to a stochastic control problem where we have a sequence of random variables, whose variances can be chosen based on all past information. In this note we study the tail behavior of the G-normal distribution through analyzing a nonlinear heat equation. Asymptotic results are provided so that the tail probabilities can be easily evaluated with high accuracy. This study also has a significant impact on the hypothesis testing theory for heteroscedastic data; we show that even if the data are generated under the null hypothesis, it is possible to cheat and attain statistical significance by sequentially manipulating the error variances of the observations.

rate research

Multistage Hypothesis Tests for the Mean of a Normal Distribution

411 - Xinjia Chen 2011

In this paper, we have developed new multistage tests which guarantee prescribed level of power and are more efficient than previous tests in terms of average sampling number and the number of sampling operations. Without truncation, the maximum sampling numbers of our testing plans are absolutely bounded. Based on geometrical arguments, we have derived extremely tight bounds for the operating characteristic function. To reduce the computational complexity for the relevant integrals, we propose adaptive scanning algorithms which are not only useful for present hypothesis testing problem but also for other problem areas.

Statistics Theory Probability Methodology

Testing statistical hypothesis on random trees and applications to the protein classification problem

182 - Jorge R. Busch , Pablo A. Ferrari , Ana Georgina Flesia 2006

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov--Smirnov-type goodness-of-fit test proposed by Balding et al. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford--Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton--Watson related processes.

Statistics Theory Probability Statistics Theory

On the Asymptotic Distribution of the Scan Statistic for Empirical Distributions

362 - Andrew Ying , Wen-Xin Zhou 2019

We investigate the asymptotic behavior of several variants of the scan statistic applied to empirical distributions, which can be applied to detect the presence of an anomalous interval with any length. Of particular interest is Studentized scan statistic that is preferable in practice. The main ingredients in the proof are Kolmogorovs theorem, a Poisson approximation, and recent technical results by Kabluchko et al (2014).

Statistics Theory Probability Statistics Theory

Testing for Independence of Large Dimensional Vectors

203 - Taras Bodnar , Holger Dette , Nestor Parolya 2017

In this paper new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose we study the weak convergence of linear spectral statistics of central and (conditionally) non-central Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) non-central Fisher matrices is derived which is then used to analyse the power of the tests under the alternative. The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.

Statistics Theory Probability Statistics Theory

Canonical correlation coefficients of high-dimensional normal vectors: finite rank case

396 - Zhigang Bao , Jiang Hu , Guangming Pan 2014

Consider a normal vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $mathbf{z}$ at hand, we study the correlation between $mathbf{x}$ and $mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. Under the additional assumptions $(p+q)/nto yin (0,1)$ and $p/q otto 1$, we study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, namely $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.

Statistics Theory Probability Statistics Theory

comments

Fetching comments

Peninsula Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A hypothesis-testing perspective on the G-normal distribution theory

Ask ChatGPT about the research

No Arabic abstract

Read More