ﻻ يوجد ملخص باللغة العربية
Motivation: Although principal component analysis (PCA) is widely used for the dimensional reduction of biomedical data, interpretation of PCA results remains daunting. Most existing methods attempt to explain each principal component (PC) in terms of a small number of variables by generating approximate PCs with few non-zero loadings. Although useful when just a few variables dominate the population PCs, these methods are often inadequate for characterizing the PCs of high-dimensional genomic data. For genomic data, reproducible and biologically meaningful PC interpretation requires methods based on the combined signal of functionally related sets of genes. While gene set testing methods have been widely used in supervised settings to quantify the association of groups of genes with clinical outcomes, these methods have seen only limited application for testing the enrichment of gene sets relative to sample PCs. Results: We describe a novel approach, principal component gene set enrichment (PCGSE), for computing the statistical association between gene sets and the PCs of genomic data. The PCGSE method performs a two-stage competitive gene set test using the correlation between each gene and each PC as the gene-level test statistic with flexible choice of both the gene set test statistic and the method used to compute the null distribution of the gene set statistic. Using simulated data with simulated gene sets and real gene expression data with curated gene sets, we demonstrate that biologically meaningful and computationally efficient results can be obtained from a simple parametric version of the PCGSE method that performs a correlation-adjusted two-sample t-test between the gene-level test statistics for gene set members and genes not in the set. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: rob.frost@dartmouth.edu or jason.h.moore@dartmouth.edu
Motivation: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absenc
Summary: Interpretation and prioritization of candidate hits from genome-scale screening experiments represent a significant analytical challenge, particularly when it comes to an understanding of cancer relevance. We have developed a flexible tool t
Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochem
Motivated by gene set enrichment analysis, we investigate the problem of combined hypothesis testing on a graph. We introduce a general framework to effectively use the structural information of the underlying graph when testing multivariate means. A
We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any b