ﻻ يوجد ملخص باللغة العربية
Background. Emerging technologies now allow for mass spectrometry based profiling of up to thousands of small molecule metabolites (metabolomics) in an increasing number of biosamples. While offering great promise for revealing insight into the pathogenesis of human disease, standard approaches have yet to be established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes including disease outcomes. To determine optimal statistical approaches for metabolomics analysis, we sought to formally compare traditional statistical as well as newer statistical learning methods across a range of metabolomics dataset types. Results. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observed that with an increasing number of study subjects, univariate compared to multivariate methods resulted in a higher false discovery rate due to substantial correlations among metabolites. In scenarios wherein the number of assayed metabolites increases, as in the application of nontargeted versus targeted metabolomics measures, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. Conclusion. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small sized cohorts, sparse multivariate models exhibited the most robust statistical power with more consistent results. These findings have important implications for the analysis of metabolomics studies of human disease.
High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity and mechanisms underlying human health and disease. Large-scale metabolomics data, gen
This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statist
The projected increase of genotyping in the clinic and the rise of large genomic databases has led to the possibility of using patient medical data to perform genomewide association studies (GWAS) on a larger scale and at a lower cost than ever befor
Numerous biological approaches are available to characterise the mechanisms which govern the formation of human embryonic stem cell (hESC) colonies. To understand how the kinematics of single and pairs of hESCs impact colony formation, we study their
A number of recent emerging applications call for studying data streams, potentially infinite flows of information updated in real-time. When multiple co-evolving data streams are observed, an important task is to determine how these streams depend o