Massive Multi-Omics Microbiome Database (M3DB): A Scalable Data Warehouse and Analytics Platform for Microbiome Datasets

47 0 0.0 ( 0 )

Download Cite

Added by Nihar Sheth

Publication date 2015

fields Biology

and research's language is English

Authors Shaun W. Norris - Steven P. Bradley - Hardik I. Parikh

Other Quantitative Biology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Massive Multi-Omics Microbiome Database (M3DB) is a data warehousing and analytics solution designed to handle diverse, complex, and unprecedented volumes of sequence and taxonomic classification data obtained in a typical microbiome project using NGS technologies. M3DB is a platform developed on Apache Hadoop, Apache Hive and PostgreSQL technologies. It enables users to store, analyze and manage high volumes of data, and also provides them the ability to query it in a fast and efficient manner. The M3DB framework includes command line tools to process and store microbiome data, along with an easy-to-use web-interface for uploading, querying, analyzing and visualizing the data and/or results. Availability: The source-code of M3DB is freely available for download at http://www.github.com/nisheth/M3DB.

rate research

Phylogenetics and the human microbiome

796 - Frederick A Matsen IV 2014

The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.

Populations and Evolution Genomics

Testing for differential abundance in compositional counts data, with application to microbiome studies

625 - Barak Brill , Amnon Amir , Ruth Heller 2019

Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositi onal, so a change in the abundance of one taxon in the microbiota induces a change in the number of sequenced counts across all taxa. The data is typically sparse, with zero counts present either due to biological variance or limited sequencing depth (technical zeros). For low abundance taxa, the chance for technical zeros is non-negligible. We show that existing methods designed to identify differential abundance for compositional data may have an inflated number of false positives due to improper handling of the zero counts. We introduce a novel non-parametric approach which provides valid inference even when the fraction of zero counts is substantial. Our approach uses a set of reference taxa that are non-differentially abundant, which can be estimated from the data or from outside information. We show the usefulness of our approach via simulations, as well as on three different data sets: a Crohns disease study, the Human Microbiome Project, and an experiment with spiked-in bacteria.

Genomics Applications

Statistical computation methods for microbiome compositional data network inference

98 - Liang Chen , Qiuyan He , Hui Wan 2021

Microbes can affect processes from food production to human health. Such microbes are not isolated, but rather interact with each other and establish connections with their living environments. Understanding these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A common and essential approach toward this objective involves the inference of microbiome interaction networks. Although network inference methods in other fields have been studied before, applying these methods to estimate microbiome associations based on compositional data will not yield valid results. On the one hand, features of microbiome data such as compositionality, sparsity and high-dimensionality challenge the data normalization and the design of computational methods. On the other hand, several issues like microbial community heterogeneity, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this paper, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various assumptions and research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks and differential networks. Their scope of applications, advantages and limitations are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has captured all the aspects of interest to date. In addition, we discuss the challenges now confronting current microbial associations study and future prospects. Finally, we highlight that the research in microbial network inference requires the joint promotion of statistical computation methods and experimental techniques.

Applications

Squash root microbiome transplants and metagenomic inspection for in situ arid adaptations

363 - Cristobal Hernandez-Alvarez , Felipe Garcia-Oliva , Rocion Cruz-Ortega 2021

Arid zones contain a diverse set of microbes capable of survival under dry conditions, some of which can form relationships with plants under drought stress conditions to improve plant health. We studied squash (Cucurbita pepo L.) root microbiome under historically arid and humid sites, both in situ and performing a common garden experiment. Plants were grown in soils from sites with different drought levels, using in situ collected soils as the microbial source. We described and analyzed bacterial diversity by 16S rRNA gene sequencing (N=48) from the soil, rhizosphere, and endosphere. Proteobacteria were the most abundant phylum present in humid and arid samples, while Actinobacteriota abundance was higher in arid ones. The Beta-diversity analyses showed split microbiomes between arid and humid microbiomes, and aridity and soil pH levels could explain it. These differences between humid and arid microbiomes were maintained in the common garden experiment, showing that it is possible to transplant in situ diversity to the greenhouse. We detected a total of 1009 bacterial genera; 199 exclusively associated with roots under arid conditions. With shotgun metagenomic sequencing of rhizospheres (N=6), we identified 2969 protein families in the squash core metagenome and found an increased number of exclusively protein families from arid (924) than humid samples (158). We found arid conditions enriched genes involved in protein degradation and folding, oxidative stress, compatible solute synthesis, and ion pumps associated with osmotic regulation. Plant phenotyping allowed us to correlate bacterial communities with plant growth. Our study revealed that it is possible to evaluate microbiome diversity ex-situ and identify critical species and genes involved in plant-microbe interactions in historically arid locations.

Genomics Populations and Evolution

Local biplots for multi-dimensional scaling, with application to the microbiome

50 - Julia Fukuyama 2020

We present local biplots, a an extension of the classic principal components biplot to multi-dimensional scaling. Noticing that principal components biplots have an interpretation as the Jacobian of a map from data space to the principal subspace, we define local biplots as the Jacobian of the analogous map for multi-dimensional scaling. In the process, we show a close relationship between our local biplot axes, generalized Euclidean distances, and generalized principal components. In simulations and real data we show how local biplots can shed light on what variables or combinations of variables are important for the low-dimensional embedding provided by multi-dimensional scaling. They give particular insight into a class of phylogenetically-informed distances commonly used in the analysis of microbiome data, showing that different variants of these distances can be interpreted as implicitly smoothing the data along the phylogenetic tree and that the extent of this smoothing is variable.

Methodology