Research papers, master and doctoral theses published by Jianfeng Yao

Asymptotic normality for eigenvalue statistics of a general sample covariance matrix when $p/n to infty$ and applications

106 - Jiaxin Qiu , Zeng Li , Jianfeng Yao 2021

The asymptotic normality for a large family of eigenvalue statistics of a general sample covariance matrix is derived under the ultra-high dimensional setting, that is, when the dimension to sample size ratio $p/n to infty$. Based on this CLT result, we first adapt the covariance matrix test problem to the new ultra-high dimensional context. Then as a second application, we develop a new test for the separable covariance structure of a matrix-valued white noise. Simulation experiments are conducted for the investigation of finite-sample properties of the general asymptotic normality of eigenvalue statistics, as well as the second test for separable covariance structure of matrix-valued white noise.

Methodology Statistics Theory Statistics Theory

On singular values of data matrices with general independent columns

120 - Tianxing Mei , Chen Wang , Jianfeng Yao 2021

In this paper, we analyse singular values of a large $ptimes n$ data matrix $mathbf{X}_n= (mathbf{x}_{n1},ldots,mathbf{x}_{nn})$ where the column $mathbf{x}_{nj}$s are independent $p$-dimensional vectors, possibly with different distributions. Such data matrices are common in high-dimensional statistics. Under a key assumption that the covariance matrices $mathbf{Sigma}_{nj}=text{Cov}(mathbf{x}_{nj})$ can be asymptotically simultaneously diagonalizable, and appropriate convergence of their spectra, we establish a limiting distribution for the singular values of $mathbf{X}_n$ when both dimension $p$ and $n$ grow to infinity in a comparable magnitude. The matrix model goes beyond and includes many existing works on different types of sample covariance matrices, including the weighted sample covariance matrix, the Gram matrix model and the sample covariance matrix of linear times series models. Furthermore, we develop two applications of our general approach. First, we obtain the existence and uniqueness of a new limiting spectral distribution of realized covariance matrices for a multi-dimensional diffusion process with anisotropic time-varying co-volatility processes. Secondly, we derive the limiting spectral distribution for singular values of the data matrix for a recent matrix-valued auto-regressive model. Finally, for a generalized finite mixture model, the limiting spectral distribution for singular values of the data matrix is obtained.

Statistics Theory Statistics Theory

Linear regression under model uncertainty

89 - Shuzhen Yang , Jianfeng Yao 2021

We reexamine the classical linear regression model when the model is subject to two types of uncertainty: (i) some of covariates are either missing or completely inaccessible, and (ii) the variance of the measurement error is undetermined and changing according to a mechanism unknown to the statistician. By following the recent theory of sublinear expectation, we propose to characterize such mean and variance uncertainty in the response variable by two specific nonlinear random variables, which encompass an infinite family of probability distributions for the response variable in the sense of (linear) classical probability theory. The approach enables a family of estimators under various loss functions for the regression parameter and the parameters related to model uncertainty. The consistency of the estimators is established under mild conditions on the data generation process. Three applications are introduced to assess the quality of the approach including a forecasting model for the S&P Index.

Statistics Theory Statistics Theory

Eigenvalue distribution of a high-dimensional distance covariance matrix with application

343 - Weiming Li , Qinwen Wang , Jianfeng Yao 2021

We introduce a new random matrix model called distance covariance matrix in this paper, whose normalized trace is equivalent to the distance covariance. We first derive a deterministic limit for the eigenvalue distribution of the distance covariance matrix when the dimensions of the vectors and the sample size tend to infinity simultaneously. This limit is valid when the vectors are independent or weakly dependent through a finite-rank perturbation. It is also universal and independent of the details of the distributions of the vectors. Furthermore, the top eigenvalues of this distance covariance matrix are shown to obey an exact phase transition when the dependence of the vectors is of finite rank. This finding enables the construction of a new detector for such weak dependence where classical methods based on large sample covariance matrices or sample canonical correlations may fail in the considered high-dimensional framework.

Statistics Theory Statistics Theory

ERStruct: An Eigenvalue Ratio Approach to Inferring Population Structure from Sequencing Data

65 - Yuyang Xu , Zhonghua Liu , Jianfeng Yao 2021

Inference of population structure from genetic data plays an important role in population and medical genetics studies. The traditional EIGENSTRAT method has been widely used for computing and selecting top principal components that capture population structure information (Price et al., 2006). With the advancement and decreasing cost of sequencing technology, whole-genome sequencing data provide much richer information about the underlying population structures. However, the EIGENSTRAT method was originally developed for analyzing array-based genotype data and thus may not perform well on sequencing data for two reasons. First, the number of genetic variants $p$ is much larger than the sample size $n$ in sequencing data such that the sample-to-marker ratio $n/p$ is nearly zero, violating the assumption of the Tracy-Widom test used in the EIGENSTRAT method. Second, the EIGENSTRAT method might not be able to handle the linkage disequilibrium (LD) well in sequencing data. To resolve those two critical issues, we propose a new statistical method called ERStruct to estimate the number of latent sub-populations based on sequencing data. We propose to use the ratio of successive eigenvalues as a more robust testing statistic, and then we approximate the null distribution of our proposed test statistic using modern random matrix theory. Simulation studies found that our proposed ERStruct method has outperformed the traditional Tracy-Widom test on sequencing data. We further use two public data sets from the HapMap 3 and the 1000 Genomes Projects to demonstrate the performance of our ERStruct method. We also implement our ERStruct in a MATLAB toolbox which is now publicly available on GitHub through https://github.com/bglvly/ERStruct.

Applications

Extension of the Lagrange multiplier test for error cross-section independence to large panels with non normal errors

85 - Zhaoyuan Li , Jianfeng Yao 2021

This paper reexamines the seminal Lagrange multiplier test for cross-section independence in a large panel model where both the number of cross-sectional units n and the number of time series observations T can be large. The first contribution of the paper is an enlargement of the test with two extensions: firstly the new asymptotic normality is derived in a simultaneous limiting scheme where the two dimensions (n, T) tend to infinity with comparable magnitudes; second, the result is valid for general error distribution (not necessarily normal). The second contribution of the paper is a new test statistic based on the sum of the fourth powers of cross-section correlations from OLS residuals, instead of their squares used in the Lagrange multiplier statistic. This new test is generally more powerful, and the improvement is particularly visible against alternatives with weak or sparse cross-section dependence. Both simulation study and real data analysis are proposed to demonstrate the advantages of the enlarged Lagrange multiplier test and the power enhanced test in comparison with the existing procedures.

Econometrics Methodology

On eigenvalue distributions of large auto-covariance matrices

157 - Jianfeng Yao , Wangjun Yuan 2020

In this article, we establish a limiting distribution for eigenvalues of a class of auto-covariance matrices. The same distribution has been found in the literature for a regularized version of these auto-covariance matrices. The original non-regularized auto-covariance matrices are non invertible which introduce supplementary diffculties for the study of their eigenvalues through Girkos Hermitization scheme. The key result in this paper is a new polynomial lower bound for the least singular value of the resolvent matrices associated to a rank-defective quadratic function of a random matrix with independent and identically distributed entries. Another improvement in the paper is that the lag of the auto-covariance matrices can grow to infinity with the matrix dimension.

Probability

Limiting distributions for eigenvalues of sample correlation matrices from heavy-tailed populations

149 - Johannes Heiny , Jianfeng Yao 2020

Consider a $p$-dimensional population ${mathbf x} inmathbb{R}^p$ with iid coordinates in the domain of attraction of a stable distribution with index $alphain (0,2)$. Since the variance of ${mathbf x}$ is infinite, the sample covariance matrix ${mathbf S}_n=n^{-1}sum_{i=1}^n {{mathbf x}_i}{mathbf x}_i$ based on a sample ${mathbf x}_1,ldots,{mathbf x}_n$ from the population is not well behaved and it is of interest to use instead the sample correlation matrix ${mathbf R}_n= {operatorname{diag}({mathbf S}_n)}^{-1/2}, {mathbf S}_n {operatorname{diag}({mathbf S}_n)}^{-1/2}$. This paper finds the limiting distributions of the eigenvalues of ${mathbf R}_n$ when both the dimension $p$ and the sample size $n$ grow to infinity such that $p/nto gamma in (0,infty)$. The family of limiting distributions ${H_{alpha,gamma}}$ is new and depends on the two parameters $alpha$ and $gamma$. The moments of $H_{alpha,gamma}$ are fully identified as sum of two contributions: the first from the classical Marv{c}enko-Pastur law and a second due to heavy tails. Moreover, the family ${H_{alpha,gamma}}$ has continuous extensions at the boundaries $alpha=2$ and $alpha=0$ leading to the Marv{c}enko-Pastur law and a modified Poisson distribution, respectively. Our proofs use the method of moments, the path-shortening algorithm developed in [18] and some novel graph counting combinatorics. As a consequence, the moments of $H_{alpha,gamma}$ are expressed in terms of combinatorial objects such as Stirling numbers of the second kind. A simulation study on these limiting distributions $H_{alpha,gamma}$ is also provided for comparison with the Marv{c}enko-Pastur law.

Probability Statistics Theory Statistics Theory

Eigenvalue distributions of high-dimensional matrix processes driven by fractional Brownian motion

67 - Jian Song , Jianfeng Yao , Wangjun Yuan 2020

In this article, we study high-dimensional behavior of empirical spectral distributions ${L_N(t), tin[0,T]}$ for a class of $Ntimes N$ symmetric/Hermitian random matrices, whose entries are generated from the solution of stochastic differential equation driven by fractional Brownian motion with Hurst parameter $H in(1/2,1)$. For Wigner-type matrices, we obtain almost sure relative compactness of ${L_N(t), tin[0,T]}_{Ninmathbb N}$ in $C([0,T], mathbf P(mathbb R))$ following the approach in cite{Anderson2010}; for Wishart-type matrices, we obtain tightness of ${L_N(t), tin[0,T]}_{Ninmathbb N}$ on $C([0,T], mathbf P(mathbb R))$ by tightness criterions provided in Appendix ref{subset:tightness argument}. The limit of ${L_N(t), tin[0,T]}$ as $Nto infty$ is also characterised.

Probability

High-dimensional central limit theorems for eigenvalue distributions of generalized Wishart processes

104 - Jian Song , Jianfeng Yao , Wangjun Yuan 2019

We consider eigenvalues of generalized Wishart processes as well as particle systems, of which the empirical measures converge to deterministic measures as the dimension goes to infinity. In this paper, we obtain central limit theorems to characterize the fluctuations of the empirical measures around the limit measures by using stochastic calculus. As applications, central limit theorems for the Dysons Brownian motion and the eigenvalues of the Wishart process are recovered under slightly more general initial conditions, and a central limit theorem for the eigenvalues of a symmetric Ornstein-Uhlenbeck matrix process is obtained.

Probability

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد