No Arabic abstract
We propose a nonparametric method to explicitly model and represent the derivatives of smooth underlying trajectories for longitudinal data. This representation is based on a direct Karhunen--Lo`eve expansion of the unobserved derivatives and leads to the notion of derivative principal component analysis, which complements functional principal component analysis, one of the most popular tools of functional data analysis. The proposed derivative principal component scores can be obtained for irregularly spaced and sparsely observed longitudinal data, as typically encountered in biomedical studies, as well as for functional data which are densely measured. Novel consistency results and asymptotic convergence rates for the proposed estimates of the derivative principal component scores and other components of the model are derived under a unified scheme for sparse or dense observations and mild conditions. We compare the proposed representations for derivatives with alternative approaches in simulation settings and also in a wallaby growth curve application. It emerges that representations using the proposed derivative principal component analysis recover the underlying derivatives more accurately compared to principal component analysis-based approaches especially in settings where the functional data are represented with only a very small number of components or are densely sampled. In a second wheat spectra classification example, derivative principal component scores were found to be more predictive for the protein content of wheat than the conventional functional principal component scores.
Functional principal component analysis is essential in functional data analysis, but the inferences will become unconvincing when some non-Gaussian characteristics occur, such as heavy tail and skewness. The focus of this paper is to develop a robust functional principal component analysis methodology in dealing with non-Gaussian longitudinal data, for which sparsity and irregularity along with non-negligible measurement errors must be considered. We introduce a Kendalls $tau$ function whose particular properties make it a nice proxy for the covariance function in the eigenequation when handling non-Gaussian cases. Moreover, the estimation procedure is presented and the asymptotic theory is also established. We further demonstrate the superiority and robustness of our method through simulation studies and apply the method to the longitudinal CD4 cell count data in an AIDS study.
Functional principal component analysis (FPCA) could become invalid when data involve non-Gaussian features. Therefore, we aim to develop a general FPCA method to adapt to such non-Gaussian cases. A Kenalls $tau$ function, which possesses identical eigenfunctions as covariance function, is constructed. The particular formulation of Kendalls $tau$ function makes it less insensitive to data distribution. We further apply it to the estimation of FPCA and study the corresponding asymptotic consistency. Moreover, the effectiveness of the proposed method is demonstrated through a comprehensive simulation study and an application to the physical activity data collected by a wearable accelerometer monitor.
Functional binary datasets occur frequently in real practice, whereas discrete characteristics of the data can bring challenges to model estimation. In this paper, we propose a sparse logistic functional principal component analysis (SLFPCA) method to handle the functional binary data. The SLFPCA looks for local sparsity of the eigenfunctions to obtain convenience in interpretation. We formulate the problem through a penalized Bernoulli likelihood with both roughness penalty and sparseness penalty terms. An efficient algorithm is developed for the optimization of the penalized likelihood using majorization-minimization (MM) algorithm. The theoretical results indicate both consistency and sparsistency of the proposed method. We conduct a thorough numerical experiment to demonstrate the advantages of the SLFPCA approach. Our method is further applied to a physical activity dataset.
We consider spatially dependent functional data collected under a geostatistics setting, where locations are sampled from a spatial point process. The functional response is the sum of a spatially dependent functional effect and a spatially independent functional nugget effect. Observations on each function are made on discrete time points and contaminated with measurement errors. Under the assumption of spatial stationarity and isotropy, we propose a tensor product spline estimator for the spatio-temporal covariance function. When a coregionalization covariance structure is further assumed, we propose a new functional principal component analysis method that borrows information from neighboring functions. The proposed method also generates nonparametric estimators for the spatial covariance functions, which can be used for functional kriging. Under a unified framework for sparse and dense functional data, infill and increasing domain asymptotic paradigms, we develop the asymptotic convergence rates for the proposed estimators. Advantages of the proposed approach are demonstrated through simulation studies and two real data applications representing sparse and dense functional data, respectively.
We consider nonparametric estimation of the mean and covariance functions for functional/longitudinal data. Strong uniform convergence rates are developed for estimators that are local-linear smoothers. Our results are obtained in a unified framework in which the number of observations within each curve/cluster can be of any rate relative to the sample size. We show that the convergence rates for the procedures depend on both the number of sample curves and the number of observations on each curve. For sparse functional data, these rates are equivalent to the optimal rates in nonparametric regression. For dense functional data, root-n rates of convergence can be achieved with proper choices of bandwidths. We further derive almost sure rates of convergence for principal component analysis using the estimated covariance function. The results are illustrated with simulation studies.