No Arabic abstract
When considering functional principal component analysis for sparsely observed longitudinal data that take values on a nonlinear manifold, a major challenge is how to handle the sparse and irregular observations that are commonly encountered in longitudinal studies. Addressing this challenge, we provide theory and implementations for a manifold version of the principal analysis by conditional expectation (PACE) procedure that produces representations intrinsic to the manifold, extending a well-established version of functional principal component analysis targeting sparsely sampled longitudinal data in linear spaces. Key steps are local linear smoothing methods for the estimation of a Frechet mean curve, mapping the observed manifold-valued longitudinal data to tangent spaces around the estimated mean curve, and applying smoothing methods to obtain the covariance structure of the mapped data. Dimension reduction is achieved via representations based on the first few leading principal components. A finitely truncated representation of the original manifold-valued data is then obtained by mapping these tangent space representations to the manifold. We show that the proposed estimates of mean curve and covariance structure achieve state-of-the-art convergence rates. For longitudinal emotional well-being data for unemployed workers as an example of time-dynamic compositional data that are located on a sphere, we demonstrate that our methods lead to interpretable eigenfunctions and principal component scores. In a second example, we analyze the body shapes of wallabies by mapping the relative size of their body parts onto a spherical pre-shape space. Compared to standard functional principal component analysis, which is based on Euclidean geometry, the proposed approach leads to improved trajectory recovery for sparsely sampled data on nonlinear manifolds.
In many applications, we encounter data on Riemannian manifolds such as torus and rotation groups. Standard statistical procedures for multivariate data are not applicable to such data. In this study, we develop goodness-of-fit testing and interpretable model criticism methods for general distributions on Riemannian manifolds, including those with an intractable normalization constant. The proposed methods are based on extensions of kernel Stein discrepancy, which are derived from Stein operators on Riemannian manifolds. We discuss the connections between the proposed tests with existing ones and provide a theoretical analysis of their asymptotic Bahadur efficiency. Simulation results and real data applications show the validity of the proposed methods.
In ordinary quantile regression, quantiles of different order are estimated one at a time. An alternative approach, which is referred to as quantile regression coefficients modeling (QRCM), is to model quantile regression coefficients as parametric functions of the order of the quantile. In this paper, we describe how the QRCM paradigm can be applied to longitudinal data. We introduce a two-level quantile function, in which two different quantile regression models are used to describe the (conditional) distribution of the within-subject response and that of the individual effects. We propose a novel type of penalized fixed-effects estimator, and discuss its advantages over standard methods based on $ell_1$ and $ell_2$ penalization. We provide model identifiability conditions, derive asymptotic properties, describe goodness-of-fit measures and model selection criteria, present simulation results, and discuss an application. The proposed method has been implemented in the R package qrcm.
In electronic health records (EHRs), latent subgroups of patients may exhibit distinctive patterning in their longitudinal health trajectories. For such data, growth mixture models (GMMs) enable classifying patients into different latent classes based on individual trajectories and hypothesized risk factors. However, the application of GMMs is hindered by the special missing data problem in EHRs, which manifests two patient-led missing data processes: the visit process and the response process for an EHR variable conditional on a patient visiting the clinic. If either process is associated with the process generating the longitudinal outcomes, then valid inferences require accounting for a nonignorable missing data mechanism. We propose a Bayesian shared parameter model that links GMMs of multiple longitudinal health outcomes, the visit process, and the response process of each outcome given a visit using a discrete latent class variable. Our focus is on multiple longitudinal health outcomes for which there can be a clinically prescribed visit schedule. We demonstrate our model in EHR measurements on early childhood weight and height z-scores. Using data simulations, we illustrate the statistical properties of our method with respect to subgroup-specific or marginal inferences. We built the R package EHRMiss for model fitting, selection, and checking.
In this paper we develop the theory of parametric polynomial regression in Riemannian manifolds and Lie groups. We show application of Riemannian polynomial regression to shape analysis in Kendall shape space. Results are presented, showing the power of polynomial regression on the classic rat skull growth data of Bookstein as well as the analysis of the shape changes associated with aging of the corpus callosum from the OASIS Alzheimers study.
In this paper, we give explicit descriptions