Do you want to publish a course? Click here

Prediction of Alzheimers disease-associated genes by integration of GWAS summary data and expression data

143   0   0.0 ( 0 )
 Added by Rui Wang
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

Alzheimers disease is the most common cause of dementia. It is the fifth-leading cause of death among elderly people. With high genetic heritability (79%), finding disease causal genes is a crucial step in find treatment for AD. Following the International Genomics of Alzheimers Project (IGAP), many disease-associated genes have been identified; however, we dont have enough knowledge about how those disease-associated genes affect gene expression and disease-related pathways. We integrated GWAS summary data from IGAP and five different expression level data by using TWAS method and identified 15 disease causal genes under strict multiple testing (alpha<0.05), 4 genes are newly identified; identified additional 29 potential disease causal genes under false discovery rate(alpha < 0.05), 21 of them are newly identified. Many genes we identified are also associated with some autoimmune disorder.



rate research

Read More

When dealing with large scale gene expression studies, observations are commonly contaminated by unwanted variation factors such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g., when the goal is to cluster the samples or to build a corrected version of the dataset - as opposed to the study of an observed factor of interest - taking unwanted variation into account can become a difficult task. The unwanted variation factors may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data or build estimators for unsupervised problems. The proposed methods are then evaluated on three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state of the art corrections.
Recent evidence has shown that structural magnetic resonance imaging (MRI) is an effective tool for Alzheimers disease (AD) prediction and diagnosis. While traditional MRI-based diagnosis uses images acquired at a single time point, a longitudinal study is more sensitive and accurate in detecting early pathological changes of the AD. Two main difficulties arise in longitudinal MRI-based diagnosis: (1) the inconsistent longitudinal scans among subjects (i.e., different scanning time and different total number of scans); (2) the heterogeneous progressions of high-dimensional regions of interest (ROIs) in MRI. In this work, we propose a novel feature selection and estimation method which can be applied to extract features from the heterogeneous longitudinal MRI. A key ingredient of our method is the combination of smoothing splines and the $l_1$-penalty. We perform experiments on the Alzheimers Disease Neuroimaging Initiative (ADNI) database. The results corroborate the advantages of the proposed method for AD prediction in longitudinal studies.
The TADPOLE Challenge compares the performance of algorithms at predicting the future evolution of individuals at risk of Alzheimers disease. TADPOLE Challenge participants train their models and algorithms on historical data from the Alzheimers Disease Neuroimaging Initiative (ADNI) study. Participants are then required to make forecasts of three key outcomes for ADNI-3 rollover participants: clinical diagnosis, ADAS-Cog 13, and total volume of the ventricles -- which are then compared with future measurements. Strong points of the challenge are that the test data did not exist at the time of forecasting (it was acquired afterwards), and that it focuses on the challenging problem of cohort selection for clinical trials by identifying fast progressors. The submission phase of TADPOLE was open until 15 November 2017; since then data has been acquired until April 2019 from 219 subjects with 223 clinical visits and 150 Magnetic Resonance Imaging (MRI) scans, which was used for the evaluation of the participants predictions. Thirty-three teams participated with a total of 92 submissions. No single submission was best at predicting all three outcomes. For diagnosis prediction, the best forecast (team Frog), which was based on gradient boosting, obtained a multiclass area under the receiver-operating curve (MAUC) of 0.931, while for ventricle prediction the best forecast (team EMC1), which was based on disease progression modelling and spline regression, obtained mean absolute error of 0.41% of total intracranial volume (ICV). For ADAS-Cog 13, no forecast was considerably better than the benchmark mixed effects model (BenchmarkME), provided to participants before the submission deadline. Further analysis can help understand which input features and algorithms are most suitable for Alzheimers disease prediction and for aiding patient stratification in clinical trials.
Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely interacting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In particular, the structure of Hi-C data naturally inspires application of community detection methods. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted; whereas the nodes in this case, i.e. the positions on the chromosomes, are not exchangeable. We propose a network model for detecting TADs using Hi-C data that takes into account this non-exchangeability. In addition, our model explicitly makes use of cell-type specific CTCF binding sites as biological covariates and can be used to identify conserved TADs across multiple cell types. The model leads to a likelihood objective that can be efficiently optimized via relaxation. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability. Using simulated data, we show the advantages of our method and the caveats of popular community detection methods, such as spectral clustering, in this application. Applying our method to real Hi-C data, we demonstrate the domains identified have desirable epigenetic features and compare them across different cell types.
151 - Bank G. Fenyves 2021
Graph theoretical analyses of nervous systems usually omit the aspect of connection polarity, due to data insufficiency. The chemical synapse network of Caenorhabditis elegans is a well-reconstructed directed network, but the signs of its connections are yet to be elucidated. Here, we present the gene expression-based sign prediction of the ionotropic chemical synapse connectome of C. elegans (3,638 connections and 20,589 synapses total), incorporating available presynaptic neurotransmitter and postsynaptic receptor gene expression data for three major neurotransmitter systems. We made predictions for more than two-thirds of these chemical synapses and observed an excitatory-inhibitory (E:I) ratio close to 4:1 which was found similar to that observed in many real-world networks. Our open source tool (http://EleganSign.linkgroup.hu) is simple but efficient in predicting polarities by integrating neuronal connectome and gene expression data.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا