ترغب بنشر مسار تعليمي؟ اضغط هنا

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

485   0   0.0 ( 0 )
 نشر من قبل Laurent Jacob
 تاريخ النشر 2012
والبحث باللغة English




اسأل ChatGPT حول البحث

When dealing with large scale gene expression studies, observations are commonly contaminated by unwanted variation factors such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g., when the goal is to cluster the samples or to build a corrected version of the dataset - as opposed to the study of an observed factor of interest - taking unwanted variation into account can become a difficult task. The unwanted variation factors may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data or build estimators for unsupervised problems. The proposed methods are then evaluated on three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state of the art corrections.



قيم البحث

اقرأ أيضاً

142 - Sicheng Hao , Rui Wang , Yu Zhang 2018
Alzheimers disease is the most common cause of dementia. It is the fifth-leading cause of death among elderly people. With high genetic heritability (79%), finding disease causal genes is a crucial step in find treatment for AD. Following the Interna tional Genomics of Alzheimers Project (IGAP), many disease-associated genes have been identified; however, we dont have enough knowledge about how those disease-associated genes affect gene expression and disease-related pathways. We integrated GWAS summary data from IGAP and five different expression level data by using TWAS method and identified 15 disease causal genes under strict multiple testing (alpha<0.05), 4 genes are newly identified; identified additional 29 potential disease causal genes under false discovery rate(alpha < 0.05), 21 of them are newly identified. Many genes we identified are also associated with some autoimmune disorder.
159 - Anne-Claire Haury 2010
Motivation : Molecular signatures for diagnosis or prognosis estimated from large-scale gene expression data often lack robustness and stability, rendering their biological interpretation challenging. Increasing the signatures interpretability and st ability across perturbations of a given dataset and, if possible, across datasets, is urgently needed to ease the discovery of important biological processes and, eventually, new drug targets. Results : We propose a new method to construct signatures with increased stability and easier interpretability. The method uses a gene network as side interpretation and enforces a large connectivity among the genes in the signature, leading to signatures typically made of genes clustered in a few subnetworks. It combines the recently proposed graph Lasso procedure with a stability selection procedure. We evaluate its relevance for the estimation of a prognostic signature in breast cancer, and highlight in particular the increase in interpretability and stability of the signature.
151 - Bank G. Fenyves 2021
Graph theoretical analyses of nervous systems usually omit the aspect of connection polarity, due to data insufficiency. The chemical synapse network of Caenorhabditis elegans is a well-reconstructed directed network, but the signs of its connections are yet to be elucidated. Here, we present the gene expression-based sign prediction of the ionotropic chemical synapse connectome of C. elegans (3,638 connections and 20,589 synapses total), incorporating available presynaptic neurotransmitter and postsynaptic receptor gene expression data for three major neurotransmitter systems. We made predictions for more than two-thirds of these chemical synapses and observed an excitatory-inhibitory (E:I) ratio close to 4:1 which was found similar to that observed in many real-world networks. Our open source tool (http://EleganSign.linkgroup.hu) is simple but efficient in predicting polarities by integrating neuronal connectome and gene expression data.
Motivation: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effec ts in gene regulation. This knowledge can help in developing epigenetic drugs for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis. This paper develops a unified discriminative framework using a deep convolutional neural network to classify gene expression using histone modification data as input. Our system, called DeepChrome, allows automatic extraction of complex interactions among important features. To simultaneously visualize the combinatorial interactions among histone modifications, we propose a novel optimization-based technique that generates feature pattern maps from the learnt deep model. This provides an intuitive description of underlying epigenetic mechanisms that regulate genes. Results: We show that DeepChrome outperforms state-of-the-art models like Support Vector Machines and Random Forests for gene expression classification task on 56 different cell-types from REMC database. The output of our visualization technique not only validates the previous observations but also allows novel insights about combinatorial interactions among histone modification marks, some of which have recently been observed by experimental studies.
We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes b etween two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of nonhomogeneous subgraphs of a given large graph, which poses both computational and multiple hypothesis testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast and bladder cancer gene expression data analyzed in the context of KEGG and NCI pathways.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا