ﻻ يوجد ملخص باللغة العربية
Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A popular model for the joint analysis of multi-type datasets decomposes each data matrix into a low-rank common-variation matrix generated by latent factors shared across all datasets, a low-rank distinctive-variation matrix corresponding to each dataset, and an additive noise matrix. We propose decomposition-based generalized canonical correlation analysis (D-GCCA), a novel decomposition method that appropriately defines those matrices on the L2 space of random variables, whereas most existing methods are developed on its approximation, the Euclidean dot product space. Moreover to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods inadequately consider such orthogonality and can thus suffer from substantial loss of undetected common variation. Our D-GCCA takes one step further than GCCA by separating common and distinctive variations among canonical variables, and enjoys an appealing interpretation from the perspective of principal component analysis. Consistent estimators of our common-variation and distinctive-variation matrices are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale datasets. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.
For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using onl
A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding
This paper proposes a canonical-correlation-based filter method for feature selection. The sum of squared canonical correlation coefficients is adopted as the feature ranking criterion. The proposed method boosts the computational speed of the rankin
The objective of multimodal information fusion is to mathematically analyze information carried in different sources and create a new representation which will be more effectively utilized in pattern recognition and other multimedia information proce
Neural networks have seen limited use in prediction for high-dimensional data with small sample sizes, because they tend to overfit and require tuning many more hyperparameters than existing off-the-shelf machine learning methods. With small modifica