Recovering the Underlying Trajectory from Sparse and Irregular Longitudinal Data

56 0 0.0 ( 0 )

Download Cite

Added by Jiguo Cao

Publication date 2018

fields Mathematical Statistics

and research's language is English

Authors Yunlong Nie - Yuping Yang - JIguo Cao

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this article, we consider the problem of recovering the underlying trajectory when the longitudinal data are sparsely and irregularly observed and noise-contaminated. Such data are popularly analyzed with functional principal component analysis via the Principal Analysis by Conditional Estimation (PACE) method. The PACE method may sometimes be numerically unstable because it involves the inverse of the covariance matrix. We propose a sparse orthonormal approximation (SOAP) method as an alternative. It estimates the optimal empirical basis functions in the best approximation framework rather than eigen-decomposing the covariance function. The SOAP method avoids estimating the mean and covariance function, which is challenging when the assembled time points with observations for all subjects are not sufficiently dense. The SOAP method avoids the inverse of the covariance matrix, hence the computation is more stable. It does not require the functional principal component scores to follow the Gaussian distribution. We show that the SOAP estimate for the optimal empirical basis function is asymptotically consistent. The finite sample performance of the SOAP method is investigated in simulation studies in comparison with the PACE method. Our method is demonstrated by recovering the CD4 percentage curves from sparse and irregular data in the Multi-center AIDS Cohort Study.

rate research

Recovering individual-level spatial inference from aggregated binary data

90 - Nelson B. Walker , Trevor J. Hefley , Anne E. Ballmann 2020

Binary regression models are commonly used in disciplines such as epidemiology and ecology to determine how spatial covariates influence individuals. In many studies, binary data are shared in a spatially aggregated form to protect privacy. For example, rather than reporting the location and result for each individual that was tested for a disease, researchers may report that a disease was detected or not detected within geopolitical units. Often, the spatial aggregation process obscures the values of response variables, spatial covariates, and locations of each individual, which makes recovering individual-level inference difficult. We show that applying a series of transformations, including a change of support, to a bivariate point process model allows researchers to recover individual-level inference for spatial covariates from spatially aggregated binary data. The series of transformations preserves the convenient interpretation of desirable binary regression models that are commonly applied to individual-level data. Using a simulation experiment, we compare the performance of our proposed method under varying types of spatial aggregation against the performance of standard approaches using the original individual-level data. We illustrate our method by modeling individual-level probability of infection using a data set that has been aggregated to protect an at-risk and endangered species of bats. Our simulation experiment and data illustration demonstrate the utility of the proposed method when access to original non-aggregated data is impractical or prohibited.

Methodology

Identification of Underlying Dynamic System from Noisy Data with Splines

108 - Yujie Zhao , Xiaoming Huo , Yajun Mei 2021

In this paper, we propose a two-stage method called Spline Assisted Partial Differential Equation involved Model Identification (SAPDEMI) to efficiently identify the underlying partial differential equation (PDE) models from the noisy data. In the first stage -- functional estimation stage -- we employ the cubic spline to estimate the unobservable derivatives, which serve as candidates included the underlying PDE models. The contribution of this stage is that, it is computational efficient because it only requires the computational complexity of the linear polynomial of the sample size, which achieves the lowest possible order of complexity. In the second stage -- model identification stage -- we apply Least Absolute Shrinkage and Selection Operator (Lasso) to identify the underlying PDE models. The contribution of this stage is that, we focus on the model selections, while the existing literature mostly focuses on parameter estimations. Moreover, we develop statistical properties of our method for correct identification, where the main tool we use is the primal-dual witness (PDW) method. Finally, we validate our theory through various numerical examples.

Methodology

Modeling sparse connectivity between underlying brain sources for EEG/MEG

388 - Stefan Haufe , Ryota Tomioka , Guido Nolte 2009

We propose a novel technique to assess functional brain connectivity in EEG/MEG signals. Our method, called Sparsely-Connected Sources Analysis (SCSA), can overcome the problem of volume conduction by modeling neural data innovatively with the following ingredients: (a) the EEG is assumed to be a linear mixture of correlated sources following a multivariate autoregressive (MVAR) model, (b) the demixing is estimated jointly with the source MVAR parameters, (c) overfitting is avoided by using the Group Lasso penalty. This approach allows to extract the appropriate level cross-talk between the extracted sources and in this manner we obtain a sparse data-driven model of functional connectivity. We demonstrate the usefulness of SCSA with simulated data, and compare to a number of existing algorithms with excellent results.

Methodology Applications Machine Learning

Simulating longitudinal data from marginal structural models using the additive hazard model

75 - Ruth H. Keogh , Shaun R. Seaman , Jon Michael Gran 2020

Observational longitudinal data on treatments and covariates are increasingly used to investigate treatment effects, but are often subject to time-dependent confounding. Marginal structural models (MSMs), estimated using inverse probability of treatment weighting or the g-formula, are popular for handling this problem. With increasing development of advanced causal inference methods, it is important to be able to assess their performance in different scenarios to guide their application. Simulation studies are a key tool for this, but their use to evaluate causal inference methods has been limited. This paper focuses on the use of simulations for evaluations involving MSMs in studies with a time-to-event outcome. In a simulation, it is important to be able to generate the data in such a way that the correct form of any models to be fitted to those data is known. However, this is not straightforward in the longitudinal setting because it is natural for data to be generated in a sequential conditional manner, whereas MSMs involve fitting marginal rather than conditional hazard models. We provide general results that enable the form of the correctly-specified MSM to be derived based on a conditional data generating procedure, and show how the results can be applied when the conditional hazard model is an Aalen additive hazard or Cox model. Using conditional additive hazard models is advantageous because they imply additive MSMs that can be fitted using standard software. We describe and illustrate a simulation algorithm. Our results will help researchers to effectively evaluate causal inference methods via simulation.

Methodology

Sparse group variable selection for gene-environment interactions in the longitudinal study

59 - Fei Zhou , Xi Lu , Jie Ren 2021

Penalized variable selection for high dimensional longitudinal data has received much attention as accounting for the correlation among repeated measurements and providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies the potential of penalization methods is far from fully understood for accommodating structured sparsity. In this article, we develop a sparse group penalization method to conduct the bi-level gene-environment (G$times$E) interaction study under the repeatedly measured phenotype. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. Simulation studies have shown that the proposed method outperforms major competitors. In the case study of asthma data from the Childhood Asthma Management Program (CAMP), we conduct G$times$E study by using high dimensional SNP data as the Genetic factor and the longitudinal trait, forced expiratory volume in one second (FEV1), as phenotype. Our method leads to improved prediction and identification of main and interaction effects with important implications.

Methodology