No Arabic abstract
With the advent of continuous health monitoring via wearable devices, users now generate their unique streams of continuous data such as minute-level physical activity or heart rate. Aggregating these streams into scalar summaries ignores the distributional nature of data and often leads to the loss of critical information. We propose to capture the distributional properties of wearable data via user-specific quantile functions that are further used in functional regression and multi-modal distributional modelling. In addition, we propose to encode user-specific distributional information with user-specific L-moments, robust rank-based analogs of traditional moments. Importantly, this L-moment encoding results in mutually consistent functional and distributional interpretation of the results of scalar-on-function regression. We also demonstrate how L-moments can be flexibly employed for analyzing joint and individual sources of variation in multi-modal distributional data. The proposed methods are illustrated in a study of association of accelerometry-derived digital gait biomarkers with Alzheimers disease (AD) and in people with normal cognitive function. Our analysis shows that the proposed quantile-based representation results in a much higher predictive performance compared to simple distributional summaries and attains much stronger associations with clinical cognitive scales.
Multi-view data refers to a setting where features are divided into feature sets, for example because they correspond to different sources. Stacked penalized logistic regression (StaPLR) is a recently introduced method that can be used for classification and automatically selecting the views that are most important for prediction. We show how this method can easily be extended to a setting where the data has a hierarchical multi-view structure. We apply StaPLR to Alzheimers disease classification where different MRI measures have been calculated from three scan types: structural MRI, diffusion-weighted MRI, and resting-state fMRI. StaPLR can identify which scan types and which MRI measures are most important for classification, and it outperforms elastic net regression in classification performance.
Uncovering the heterogeneity in the disease progression of Alzheimers is a key factor to disease understanding and treatment development, so that interventions can be tailored to target the subgroups that will benefit most from the treatment, which is an important goal of precision medicine. However, in practice, one top methodological challenge hindering the heterogeneity investigation is that the true subgroup membership of each individual is often unknown. In this article, we aim to identify latent subgroups of individuals who share a common disorder progress over time, to predict latent subgroup memberships, and to estimate and infer the heterogeneous trajectories among the subgroups. To achieve these goals, we apply a concave fusion learning method proposed in Ma and Huang (2017) and Ma et al. (2019) to conduct subgroup analysis for longitudinal trajectories of the Alzheimers disease data. The heterogeneous trajectories are represented by subject-specific unknown functions which are approximated by B-splines. The concave fusion method can simultaneously estimate the spline coefficients and merge them together for the subjects belonging to the same subgroup to automatically identify subgroups and recover the heterogeneous trajectories. The resulting estimator of the disease trajectory of each subgroup is supported by an asymptotic distribution. It provides a sound theoretical basis for further conducting statistical inference in subgroup analysis..
In Functional Data Analysis, data are commonly assumed to be smooth functions on a fixed interval of the real line. In this work, we introduce a comprehensive framework for the analysis of functional data, whose domain is a two-dimensional manifold and the domain itself is subject to variability from sample to sample. We formulate a statistical model for such data, here called Functions on Surfaces, which enables a joint representation of the geometric and functional aspects, and propose an associated estimation framework. We assess the validity of the framework by performing a simulation study and we finally apply it to the analysis of neuroimaging data of cortical thickness, acquired from the brains of different subjects, and thus lying on domains with different geometries.
Three major biomarkers: beta-amyloid (A), pathologic tau (T), and neurodegeneration (N), are recognized as valid proxies for neuropathologic changes of Alzheimers disease. While there are extensive studies on cerebrospinal fluids biomarkers (amyloid, tau), the spatial propagation pattern across brain is missing and their interactive mechanisms with neurodegeneration are still unclear. To this end, we aim to analyze the spatiotemporal associations between ATN biomarkers using large-scale neuroimaging data. We first investigate the temporal appearances of amyloid plaques, tau tangles, and neuronal loss by modeling the longitudinal transition trajectories. Second, we propose linear mixed-effects models to quantify the pathological interactions and propagation of ATN biomarkers at each brain region. Our analysis of the current data shows that there exists a temporal latency in the build-up of amyloid to the onset of tau pathology and neurodegeneration. The propagation pattern of amyloid can be characterized by its diffusion along the topological brain network. Our models provide sufficient evidence that the progression of pathological tau and neurodegeneration share a strong regional association, which is different from amyloid.
Wearable data is a rich source of information that can provide deeper understanding of links between human behaviours and human health. Existing modelling approaches use wearable data summarized at subject level via scalar summaries using regression techniques, temporal (time-of-day) curves using functional data analysis (FDA), and distributions using distributional data analysis (DDA). We propose to capture temporally local distributional information in wearable data using subject-specific time-by-distribution (TD) data objects. Specifically, we propose scalar on time-by-distribution regression (SOTDR) to model associations between scalar response of interest such as health outcomes or disease status and TD predictors. We show that TD data objects can be parsimoniously represented via a collection of time-varying L-moments that capture distributional changes over the time-of-day. The proposed method is applied to the accelerometry study of mild Alzheimers disease (AD). Mild AD is found to be significantly associated with reduced maximal level of physical activity, particularly during morning hours. It is also demonstrated that TD predictors attain much stronger associations with clinical cognitive scales of attention, verbal memory, and executive function when compared to predictors summarized via scalar total activity counts, temporal functional curves, and quantile functions. Taken together, the present results suggest that the SOTDR analysis provides novel insights into cognitive function and AD.