Do you want to publish a course? Click here

Supervised Principal Component Regression for Functional Response with High Dimensional Predictors

221   0   0.0 ( 0 )
 Added by Xinyi Zhang
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

We propose a supervised principal component regression method for relating functional responses with high dimensional covariates. Unlike the conventional principal component analysis, the proposed method builds on a newly defined expected integrated residual sum of squares, which directly makes use of the association between functional response and predictors. Minimizing the integrated residual sum of squares gives the supervised principal components, which is equivalent to solving a sequence of nonconvex generalized Rayleigh quotient optimization problems and thus is computationally intractable. To overcome this computational challenge, we reformulate the nonconvex optimization problems into a simultaneous linear regression, with a sparse penalty added to deal with high dimensional predictors. Theoretically, we show that the reformulated regression problem recovers the same supervised principal subspace under suitable conditions. Statistically, we establish non-asymptotic error bounds for the proposed estimators. Numerical studies and an application to the Human Connectome Project lend further support.



rate research

Read More

235 - Jingru Zhang , Wei Lin 2021
Dimension reduction for high-dimensional compositional data plays an important role in many fields, where the principal component analysis of the basis covariance matrix is of scientific interest. In practice, however, the basis variables are latent and rarely observed, and standard techniques of principal component analysis are inadequate for compositional data because of the simplex constraint. To address the challenging problem, we relate the principal subspace of the centered log-ratio compositional covariance to that of the basis covariance, and prove that the latter is approximately identifiable with the diverging dimensionality under some subspace sparsity assumption. The interesting blessing-of-dimensionality phenomenon enables us to propose the principal subspace estimation methods by using the sample centered log-ratio covariance. We also derive nonasymptotic error bounds for the subspace estimators, which exhibits a tradeoff between identification and estimation. Moreover, we develop efficient proximal alternating direction method of multipliers algorithms to solve the nonconvex and nonsmooth optimization problems. Simulation results demonstrate that the proposed methods perform as well as the oracle methods with known basis. Their usefulness is illustrated through an analysis of word usage pattern for statisticians.
Functional principal component analysis (FPCA) could become invalid when data involve non-Gaussian features. Therefore, we aim to develop a general FPCA method to adapt to such non-Gaussian cases. A Kenalls $tau$ function, which possesses identical eigenfunctions as covariance function, is constructed. The particular formulation of Kendalls $tau$ function makes it less insensitive to data distribution. We further apply it to the estimation of FPCA and study the corresponding asymptotic consistency. Moreover, the effectiveness of the proposed method is demonstrated through a comprehensive simulation study and an application to the physical activity data collected by a wearable accelerometer monitor.
Functional binary datasets occur frequently in real practice, whereas discrete characteristics of the data can bring challenges to model estimation. In this paper, we propose a sparse logistic functional principal component analysis (SLFPCA) method to handle the functional binary data. The SLFPCA looks for local sparsity of the eigenfunctions to obtain convenience in interpretation. We formulate the problem through a penalized Bernoulli likelihood with both roughness penalty and sparseness penalty terms. An efficient algorithm is developed for the optimization of the penalized likelihood using majorization-minimization (MM) algorithm. The theoretical results indicate both consistency and sparsistency of the proposed method. We conduct a thorough numerical experiment to demonstrate the advantages of the SLFPCA approach. Our method is further applied to a physical activity dataset.
Methods for global measurement of transcript abundance such as microarrays and RNA-seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives, or ignore any unknown grouping structures for the features. We propose a method called SuffPCR that yields improved predictions in high-dimensional tasks including regression and classification, especially in the typical context of omics with correlated features. SuffPCR first estimates sparse principal components and then estimates a linear model on the recovered subspace. Because the estimated subspace is sparse in the features, the resulting predictions will depend on only a small subset of genes. SuffPCR works well on a variety of simulated and experimental transcriptomic data, performing nearly optimally when the model assumptions are satisfied. We also demonstrate near-optimal theoretical guarantees.
Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well in the presence of outliers. To address this challenge, a new robust functional principal component analysis approach based on the functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced where we propose estimation procedures for both eigenfunctions and eigenvalues with and without measurement error. Compared to existing robust FPCA methods, the proposed one requires weaker distributional assumptions to conserve the eigenspace of the covariance function. In particular, a class of distributions called the weakly functional coordinate symmetric (weakly FCS) is introduced that allows for severe asymmetry and is strictly larger than the functional elliptical distribution class, the latter of which has been well used in the robust statistics literature. The robustness of the PASS FPCA is demonstrated via simulation studies and analyses of accelerometry data from a large-scale epidemiological study of physical activity on older women that partly motivates this work.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا