Functional modules from variable genes: Leveraging percolation to analyze noisy, high-dimensional data


Abstract in English

While measurement advances now allow extensive surveys of gene activity (large numbers of genes across many samples), interpretation of these data is often confounded by noise -- expression counts can differ strongly across samples due to variation of both biological and experimental origin. Complimentary to perturbation approaches, we extract functionally related groups of genes by analyzing the standing variation within a sampled population. To distinguish biologically meaningful patterns from uninterpretable noise, we focus on correlated variation and develop a novel density-based clustering approach that takes advantage of a percolation transition generically arising in random, uncorrelated data. We apply our approach to two contrasting RNA sequencing data sets that sample individual variation -- across single cells of fission yeast and whole animals of C. elegans worms -- and demonstrate robust applicability and versatility in revealing correlated gene clusters of diverse biological origin, including cell cycle phase, development/reproduction, tissue-specific functions, and feeding history. Our technique exploits generic features of noisy high-dimensional data and is applicable, beyond gene expression, to feature-rich data that sample population-level variability in the presence of noise.

Download