ﻻ يوجد ملخص باللغة العربية
Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern data sets due to high-dimensionality and mixed data types such as continuous, binary and zero-inflated. To overcome these challenges, we propose a semiparametric approach for sparse canonical correlation analysis based on Gaussian copula. Our main contribution is a truncated latent Gaussian copula model for data with excess zeros, which allows us to derive a rank-based estimator of the latent correlation matrix for mixed variable types without the estimation of marginal transformation functions. The resulting canonical correlation analysis method works well in high-dimensional settings as demonstrated via numerical studies, as well as in application to the analysis of association between gene expression and micro RNA data of breast cancer patients.
Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of continuous variables. CCA has applications in many fields, such as genomics and neuroimaging. It can extract m
Classical canonical correlation analysis (CCA) requires matrices to be low dimensional, i.e. the number of features cannot exceed the sample size. Recent developments in CCA have mainly focused on the high-dimensional setting, where the number of fea
Studying the neurological, genetic and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The data sets typically comprise structured sequences of syllables or `songs pro
High-dimensional variable selection is an important issue in many scientific fields, such as genomics. In this paper, we develop a sure independence feature screening pro- cedure based on kernel canonical correlation analysis (KCCA-SIS, for short). K
This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semi-continuous responses. I