ترغب بنشر مسار تعليمي؟ اضغط هنا

Joint integrative analysis of multiple data sources with correlated vector outcomes

305   0   0.0 ( 0 )
 نشر من قبل Emily C Hector
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative analysis is to estimate covariate effects on all outcomes through a marginal regression model in a statistically and computationally efficient way. We develop a data integration procedure for statistical estimation and inference of regression parameters that is implemented in a fully distributed and parallelized computational scheme. To overcome computational and modeling challenges arising from the high-dimensional likelihood of the correlated vector outcomes, we propose to analyze each data source using Qu, Lindsay and Li (2000)s quadratic inference functions, and then to jointly reestimate parameters from each data source by accounting for correlation between data sources using a combined meta-estimator in a similar spirit to Hansen (1982)s generalised method of moments. We show both theoretically and numerically that the proposed method yields efficiency improvements and is computationally fast. We illustrate the proposed methodology with the joint integrative analysis of the association between smoking and metabolites in a large multi-cohort study and provide an R package for ease of implementation.



قيم البحث

اقرأ أيضاً

Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation. In this paper, we propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes, where both the numbers of responses and predictors can be high-dimensional. Motivated by sparse factor regression, we convert the multi-response regression into a set of univariate-response regressions, which can be efficiently implemented in parallel. Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection. Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy, variable selection, and computation efficiency.
In electronic health records (EHRs), latent subgroups of patients may exhibit distinctive patterning in their longitudinal health trajectories. For such data, growth mixture models (GMMs) enable classifying patients into different latent classes base d on individual trajectories and hypothesized risk factors. However, the application of GMMs is hindered by the special missing data problem in EHRs, which manifests two patient-led missing data processes: the visit process and the response process for an EHR variable conditional on a patient visiting the clinic. If either process is associated with the process generating the longitudinal outcomes, then valid inferences require accounting for a nonignorable missing data mechanism. We propose a Bayesian shared parameter model that links GMMs of multiple longitudinal health outcomes, the visit process, and the response process of each outcome given a visit using a discrete latent class variable. Our focus is on multiple longitudinal health outcomes for which there can be a clinically prescribed visit schedule. We demonstrate our model in EHR measurements on early childhood weight and height z-scores. Using data simulations, we illustrate the statistical properties of our method with respect to subgroup-specific or marginal inferences. We built the R package EHRMiss for model fitting, selection, and checking.
Evidence from animal models and epidemiological studies has linked prenatal alcohol exposure (PAE) to a broad range of long-term cognitive and behavioral deficits. However, there is virtually no information in the scientific literature regarding the levels of PAE associated with an increased risk of clinically significant adverse effects. During the period from 1975-1993, several prospective longitudinal cohort studies were conducted in the U.S., in which maternal reports regarding alcohol use were obtained during pregnancy and the cognitive development of the offspring was assessed from early childhood through early adulthood. The sample sizes in these cohorts did not provide sufficient power to examine effects associated with different levels and patterns of PAE. To address this critical public health issue, we have developed a hierarchical meta-analysis to synthesize information regarding the effects of PAE on cognition, integrating data on multiple endpoints from six U.S. longitudinal cohort studies. Our approach involves estimating the dose-response coefficients for each endpoint and then pooling these correlated dose-response coefficients to obtain an estimated `global effect of exposure on cognition. In the first stage, we use individual participant data to derive estimates of the effects of PAE by fitting regression models that adjust for potential confounding variables using propensity scores. The correlation matrix characterizing the dependence between the endpoint-specific dose-response coefficients estimated within each cohort is then run, while accommodating incomplete information on some endpoints. We also compare and discuss inferences based on the proposed approach to inferences based on a full multivariate analysis
In Genome-Wide Association Studies (GWAS) where multiple correlated traits have been measured on participants, a joint analysis strategy, whereby the traits are analyzed jointly, can improve statistical power over a single-trait analysis strategy. Th ere are two questions of interest to be addressed when conducting a joint GWAS analysis with multiple traits. The first question examines whether a genetic loci is significantly associated with any of the traits being tested. The second question focuses on identifying the specific trait(s) that is associated with the genetic loci. Since existing methods primarily focus on the first question, this paper seeks to provide a complementary method that addresses the second question. We propose a novel method, Variational Inference for Multiple Correlated Outcomes (VIMCO), that focuses on identifying the specific trait that is associated with the genetic loci, when performing a joint GWAS analysis of multiple traits, while accounting for correlation among the multiple traits. We performed extensive numerical studies and also applied VIMCO to analyze two datasets. The numerical studies and real data analysis demonstrate that VIMCO improves statistical power over single-trait analysis strategies when the multiple traits are correlated and has comparable performance when the traits are not correlated.
Poverty is a multidimensional concept often comprising a monetary outcome and other welfare dimensions such as education, subjective well-being or health, that are measured on an ordinal scale. In applied research, multidimensional poverty is ubiquit ously assessed by studying each poverty dimension independently in univariate regression models or by combining several poverty dimensions into a scalar index. This inhibits a thorough analysis of the potentially varying interdependence between the poverty dimensions. We propose a multivariate copula generalized additive model for location, scale and shape (copula GAMLSS or distributional copula model) to tackle this challenge. By relating the copula parameter to covariates, we specifically examine if certain factors determine the dependence between poverty dimensions. Furthermore, specifying the full conditional bivariate distribution, allows us to derive several features such as poverty risks and dependence measures coherently from one model for different individuals. We demonstrate the approach by studying two important poverty dimensions: income and education. Since the level of education is measured on an ordinal scale while income is continuous, we extend the bivariate copula GAMLSS to the case of mixed ordered-continuous outcomes. The new model is integrated into the GJRM package in R and applied to data from Indonesia. Particular emphasis is given to the spatial variation of the income-education dependence and groups of individuals at risk of being simultaneously poor in both education and income dimensions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا