Flexible Principal Component Analysis for Exponential Family Distributions


Abstract in English

Traditional principal component analysis (PCA) is well known in high-dimensional data analysis, but it requires to express data by a matrix with observations to be continuous. To overcome the limitations, a new method called flexible PCA (FPCA) for exponential family distributions is proposed. The goal is to ensure that it can be implemented to arbitrary shaped region for either count or continuous observations. The methodology of FPCA is developed under the framework of generalized linear models. It provides statistical models for FPCA not limited to matrix expressions of the data. A maximum likelihood approach is proposed to derive the decomposition when the number of principal components (PCs) is known. This naturally induces a penalized likelihood approach to determine the number of PCs when it is unknown. By modifying it for missing data problems, the proposed method is compared with previous PCA methods for missing data. The simulation study shows that the performance of FPCA is always better than its competitors. The application uses the proposed method to reduce the dimensionality of arbitrary shaped sub-regions of images and the global spread patterns of COVID-19 under normal and Poisson distributions, respectively.

Download