No Arabic abstract
Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. To tackle the challenge, we develop randomized algorithms for two important FDA methods: functional principal component analysis (FPCA) and functional linear regression (FLR) with scalar response. The two methods are connected as they both rely on the accurate estimation of functional principal subspace. The proposed algorithms draw subsamples from the large dataset at hand and apply FPCA or FLR over the subsamples to reduce the computational cost. To effectively preserve subspace information in the subsamples, we propose a functional principal subspace sampling probability, which removes the eigenvalue scale effect inside the functional principal subspace and properly weights the residual. Based on the operator perturbation analysis, we show the proposed probability has precise control over the first order error of the subspace projection operator and can be interpreted as an importance sampling for functional subspace estimation. Moreover, concentration bounds for the proposed algorithms are established to reflect the low intrinsic dimension nature of functional data in an infinite dimensional space. The effectiveness of the proposed algorithms is demonstrated upon synthetic and real datasets.
Functional principal component analysis (FPCA) could become invalid when data involve non-Gaussian features. Therefore, we aim to develop a general FPCA method to adapt to such non-Gaussian cases. A Kenalls $tau$ function, which possesses identical eigenfunctions as covariance function, is constructed. The particular formulation of Kendalls $tau$ function makes it less insensitive to data distribution. We further apply it to the estimation of FPCA and study the corresponding asymptotic consistency. Moreover, the effectiveness of the proposed method is demonstrated through a comprehensive simulation study and an application to the physical activity data collected by a wearable accelerometer monitor.
Functional binary datasets occur frequently in real practice, whereas discrete characteristics of the data can bring challenges to model estimation. In this paper, we propose a sparse logistic functional principal component analysis (SLFPCA) method to handle the functional binary data. The SLFPCA looks for local sparsity of the eigenfunctions to obtain convenience in interpretation. We formulate the problem through a penalized Bernoulli likelihood with both roughness penalty and sparseness penalty terms. An efficient algorithm is developed for the optimization of the penalized likelihood using majorization-minimization (MM) algorithm. The theoretical results indicate both consistency and sparsistency of the proposed method. We conduct a thorough numerical experiment to demonstrate the advantages of the SLFPCA approach. Our method is further applied to a physical activity dataset.
We present and describe the GPFDA package for R. The package provides flexible functionalities for dealing with Gaussian process regression (GPR) models for functional data. Multivariate functional data, functional data with multidimensional inputs, and nonseparable and/or nonstationary covariance structures can be modeled. In addition, the package fits functional regression models where the mean function depends on scalar and/or functional covariates and the covariance structure is modeled by a GPR model. In this paper, we present the versatility of GPFDA with respect to mean function and covariance function specifications and illustrate the implementation of estimation and prediction of some models through reproducible numerical examples.
Functional principal component analysis is essential in functional data analysis, but the inferences will become unconvincing when some non-Gaussian characteristics occur, such as heavy tail and skewness. The focus of this paper is to develop a robust functional principal component analysis methodology in dealing with non-Gaussian longitudinal data, for which sparsity and irregularity along with non-negligible measurement errors must be considered. We introduce a Kendalls $tau$ function whose particular properties make it a nice proxy for the covariance function in the eigenequation when handling non-Gaussian cases. Moreover, the estimation procedure is presented and the asymptotic theory is also established. We further demonstrate the superiority and robustness of our method through simulation studies and apply the method to the longitudinal CD4 cell count data in an AIDS study.
The main purpose of this paper is to facilitate the communication between the Analytic, Probabilistic and Algorithmic communities. We present a proof of convergence of the Hamiltonian (Hybrid) Monte Carlo algorithm from the point of view of the Dynamical Systems, where the evolving objects are densities of probability distributions and the tool are derived from the Functional Analysis.