No Arabic abstract
Evidence from animal models and epidemiological studies has linked prenatal alcohol exposure (PAE) to a broad range of long-term cognitive and behavioral deficits. However, there is virtually no information in the scientific literature regarding the levels of PAE associated with an increased risk of clinically significant adverse effects. During the period from 1975-1993, several prospective longitudinal cohort studies were conducted in the U.S., in which maternal reports regarding alcohol use were obtained during pregnancy and the cognitive development of the offspring was assessed from early childhood through early adulthood. The sample sizes in these cohorts did not provide sufficient power to examine effects associated with different levels and patterns of PAE. To address this critical public health issue, we have developed a hierarchical meta-analysis to synthesize information regarding the effects of PAE on cognition, integrating data on multiple endpoints from six U.S. longitudinal cohort studies. Our approach involves estimating the dose-response coefficients for each endpoint and then pooling these correlated dose-response coefficients to obtain an estimated `global effect of exposure on cognition. In the first stage, we use individual participant data to derive estimates of the effects of PAE by fitting regression models that adjust for potential confounding variables using propensity scores. The correlation matrix characterizing the dependence between the endpoint-specific dose-response coefficients estimated within each cohort is then run, while accommodating incomplete information on some endpoints. We also compare and discuss inferences based on the proposed approach to inferences based on a full multivariate analysis
We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative analysis is to estimate covariate effects on all outcomes through a marginal regression model in a statistically and computationally efficient way. We develop a data integration procedure for statistical estimation and inference of regression parameters that is implemented in a fully distributed and parallelized computational scheme. To overcome computational and modeling challenges arising from the high-dimensional likelihood of the correlated vector outcomes, we propose to analyze each data source using Qu, Lindsay and Li (2000)s quadratic inference functions, and then to jointly reestimate parameters from each data source by accounting for correlation between data sources using a combined meta-estimator in a similar spirit to Hansen (1982)s generalised method of moments. We show both theoretically and numerically that the proposed method yields efficiency improvements and is computationally fast. We illustrate the proposed methodology with the joint integrative analysis of the association between smoking and metabolites in a large multi-cohort study and provide an R package for ease of implementation.
Hierarchical inference in (generalized) regression problems is powerful for finding significant groups or even single covariates, especially in high-dimensional settings where identifiability of the entire regression parameter vector may be ill-posed. The general method proceeds in a fully data-driven and adaptive way from large to small groups or singletons of covariates, depending on the signal strength and the correlation structure of the design matrix. We propose a novel hierarchical multiple testing adjustment that can be used in combination with any significance test for a group of covariates to perform hierarchical inference. Our adjustment passes on the significance level of certain hypotheses that could not be rejected and is shown to guarantee strong control of the familywise error rate. Our method is at least as powerful as a so-called depth-wise hierarchical Bonferroni adjustment. It provides a substantial gain in power over other previously proposed inheritance hierarchical procedures if the underlying alternative hypotheses occur sparsely along a few branches in the tree-structured hierarchy.
In Genome-Wide Association Studies (GWAS) where multiple correlated traits have been measured on participants, a joint analysis strategy, whereby the traits are analyzed jointly, can improve statistical power over a single-trait analysis strategy. There are two questions of interest to be addressed when conducting a joint GWAS analysis with multiple traits. The first question examines whether a genetic loci is significantly associated with any of the traits being tested. The second question focuses on identifying the specific trait(s) that is associated with the genetic loci. Since existing methods primarily focus on the first question, this paper seeks to provide a complementary method that addresses the second question. We propose a novel method, Variational Inference for Multiple Correlated Outcomes (VIMCO), that focuses on identifying the specific trait that is associated with the genetic loci, when performing a joint GWAS analysis of multiple traits, while accounting for correlation among the multiple traits. We performed extensive numerical studies and also applied VIMCO to analyze two datasets. The numerical studies and real data analysis demonstrate that VIMCO improves statistical power over single-trait analysis strategies when the multiple traits are correlated and has comparable performance when the traits are not correlated.
While it is well known that high levels of prenatal alcohol exposure (PAE) result in significant cognitive deficits in children, the exact nature of the dose response is less well understood. In particular, there is a pressing need to identify the levels of PAE associated with an increased risk of clinically significant adverse effects. To address this issue, data have been combined from six longitudinal birth cohort studies in the United States that assessed the effects of PAE on cognitive outcomes measured from early school age through adolescence. Structural equation models (SEMs) are commonly used to capture the association among multiple observed outcomes in order to characterise the underlying variable of interest (in this case, cognition) and then relate it to PAE. However, it was not possible to apply classic SEM software in our context because different outcomes were measured in the six studies. In this paper we show how a Bayesian approach can be used to fit a multi-group multi-level structural model that maps cognition to a broad range of observed variables measured at multiple ages. These variables map to several different cognitive subdomains and are examined in relation to PAE after adjusting for confounding using propensity scores. The model also tests the possibility of a change point in the dose-response function.
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on conditional independence: separate univariate CPMs and stacked regression. Employing a simulation study and real-world example via the MIMIC-III database, we illustrate that CPMs for joint risk prediction of multiple outcomes should only be derived using methods that model the residual correlation between outcomes. In such a situation, our results suggest that probabilistic classification chains, multinomial logistic regression or the Bayesian probit model are all appropriate choices. We call into question the development of CPMs for each outcome in isolation when multiple correlated or structurally related outcomes are of interest and recommend more holistic risk prediction.