Regression Analysis of Correlations for Correlated Data


Abstract in English

Correlated data are ubiquitous in todays data-driven society. A fundamental task in analyzing these data is to understand, characterize and utilize the correlations in them in order to conduct valid inference. Yet explicit regression analysis of correlations has been so far limited to longitudinal data, a special form of correlated data, while implicit analysis via mixed-effects models lacks generality as a full inferential tool. This paper proposes a novel regression approach for modelling the correlation structure, leveraging a new generalized z-transformation. This transformation maps correlation matrices that are constrained to be positive definite to vectors with un-restricted support, and is order-invariant. Building on these two properties, we develop a regression model to relate the transformed parameters to any covariates. We show that coupled with a mean and a variance regression model, the use of maximum likelihood leads to asymptotically normal parameter estimates, and crucially enables statistical inference for all the parameters. The performance of our framework is demonstrated in extensive simulation. More importantly, we illustrate the use of our model with the analysis of the classroom data, a highly unbalanced multilevel clustered data with within-class and within-school correlations, and the analysis of the malaria immune response data in Benin, a longitudinal data with time-dependent covariates in addition to time. Our analyses reveal new insights not previously known.

Download