Do you want to publish a course? Click here

Independence test for high dimensional data based on regularized canonical correlation coefficients

267   0   0.0 ( 0 )
 Added by Yanrong Yang
 Publication date 2015
and research's language is English




Ask ChatGPT about the research

This paper proposes a new statistic to test independence between two high dimensional random vectors ${mathbf{X}}:p_1times1$ and ${mathbf{Y}}:p_2times1$. The proposed statistic is based on the sum of regularized sample canonical correlation coefficients of ${mathbf{X}}$ and ${mathbf{Y}}$. The asymptotic distribution of the statistic under the null hypothesis is established as a corollary of general central limit theorems (CLT) for the linear statistics of classical and regularized sample canonical correlation coefficients when $p_1$ and $p_2$ are both comparable to the sample size $n$. As applications of the developed independence test, various types of dependent structures, such as factor models, ARCH models and a general uncorrelated but dependent case, etc., are investigated by simulations. As an empirical application, cross-sectional dependence of daily stock returns of companies between different sections in the New York Stock Exchange (NYSE) is detected by the proposed test.



rate research

Read More

Consider a Gaussian vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively, where both $p$ and $q$ are proportional to the sample size $n$. Denote by $Sigma_{mathbf{u}mathbf{v}}$ the population cross-covariance matrix of random vectors $mathbf{u}$ and $mathbf{v}$, and denote by $S_{mathbf{u}mathbf{v}}$ the sample counterpart. The canonical correlation coefficients between $mathbf{x}$ and $mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $Sigma_{mathbf{x}mathbf{x}}^{-1}Sigma_{mathbf{x}mathbf{y}}Sigma_{mathbf{y}mathbf{y}}^{-1}Sigma_{mathbf{y}mathbf{x}}$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. We study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, denoted by $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_{+},1]$. We also obtain the limiting distribution of $lambda_i$s under appropriate normalization. Specifically, $lambda_i$ possesses Gaussian type fluctuation if $r_i>r_c$, and follows Tracy-Widom distribution if $r_i<r_c$. Some applications of our results are also discussed.
Consider a normal vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $mathbf{z}$ at hand, we study the correlation between $mathbf{x}$ and $mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. Under the additional assumptions $(p+q)/nto yin (0,1)$ and $p/q otto 1$, we study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, namely $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.
179 - Dennis Leung , Qi-Man Shao 2017
Let ${bf R}$ be the Pearson correlation matrix of $m$ normal random variables. The Raos score test for the independence hypothesis $H_0 : {bf R} = {bf I}_m$, where ${bf I}_m$ is the identity matrix of dimension $m$, was first considered by Schott (2005) in the high dimensional setting. In this paper, we study the asymptotic minimax power function of this test, under an asymptotic regime in which both $m$ and the sample size $n$ tend to infinity with the ratio $m/n$ upper bounded by a constant. In particular, our result implies that the Raos score test is rate-optimal for detecting the dependency signal $|{bf R} - {bf I}_m|_F$ of order $sqrt{m/n}$, where $|cdot|_F$ is the matrix Frobenius norm.
Dependence measures based on reproducing kernel Hilbert spaces, also known as Hilbert-Schmidt Independence Criterion and denoted HSIC, are widely used to statistically decide whether or not two random vectors are dependent. Recently, non-parametric HSIC-based statistical tests of independence have been performed. However, these tests lead to the question of the choice of the kernels associated to the HSIC. In particular, there is as yet no method to objectively select specific kernels with theoretical guarantees in terms of first and second kind errors. One of the main contributions of this work is to develop a new HSIC-based aggregated procedure which avoids such a kernel choice, and to provide theoretical guarantees for this procedure. To achieve this, we first introduce non-asymptotic single tests based on Gaussian kernels with a given bandwidth, which are of prescribed level $alpha in (0,1)$. From a theoretical point of view, we upper-bound their uniform separation rate of testing over Sobolev and Nikolskii balls. Then, we aggregate several single tests, and obtain similar upper-bounds for the uniform separation rate of the aggregated procedure over the same regularity spaces. Another main contribution is that we provide a lower-bound for the non-asymptotic minimax separation rate of testing over Sobolev balls, and deduce that the aggregated procedure is adaptive in the minimax sense over such regularity spaces. Finally, from a practical point of view, we perform numerical studies in order to assess the efficiency of our aggregated procedure and compare it to existing independence tests in the literature.
134 - Yejiong Zhu , Hao Chen 2021
We provide sufficient conditions for the asymptotic normality of the generalized correlation coefficient $sum a_{ij}b_{ij}$ under the permutation null distribution when $a_{ij}$s are symmetric and $b_{ij}$s are symmetric.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا