No Arabic abstract
Massive Multi-Omics Microbiome Database (M3DB) is a data warehousing and analytics solution designed to handle diverse, complex, and unprecedented volumes of sequence and taxonomic classification data obtained in a typical microbiome project using NGS technologies. M3DB is a platform developed on Apache Hadoop, Apache Hive and PostgreSQL technologies. It enables users to store, analyze and manage high volumes of data, and also provides them the ability to query it in a fast and efficient manner. The M3DB framework includes command line tools to process and store microbiome data, along with an easy-to-use web-interface for uploading, querying, analyzing and visualizing the data and/or results. Availability: The source-code of M3DB is freely available for download at http://www.github.com/nisheth/M3DB.
The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.
Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositi
Microbes can affect processes from food production to human health. Such microbes are not isolated, but rather interact with each other and establish connections with their living environments. Understanding these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A common and essential approach toward this objective involves the inference of microbiome interaction networks. Although network inference methods in other fields have been studied before, applying these methods to estimate microbiome associations based on compositional data will not yield valid results. On the one hand, features of microbiome data such as compositionality, sparsity and high-dimensionality challenge the data normalization and the design of computational methods. On the other hand, several issues like microbial community heterogeneity, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this paper, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various assumptions and research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks and differential networks. Their scope of applications, advantages and limitations are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has captured all the aspects of interest to date. In addition, we discuss the challenges now confronting current microbial associations study and future prospects. Finally, we highlight that the research in microbial network inference requires the joint promotion of statistical computation methods and experimental techniques.
Arid zones contain a diverse set of microbes capable of survival under dry conditions, some of which can form relationships with plants under drought stress conditions to improve plant health. We studied squash (Cucurbita pepo L.) root microbiome under historically arid and humid sites, both in situ and performing a common garden experiment. Plants were grown in soils from sites with different drought levels, using in situ collected soils as the microbial source. We described and analyzed bacterial diversity by 16S rRNA gene sequencing (N=48) from the soil, rhizosphere, and endosphere. Proteobacteria were the most abundant phylum present in humid and arid samples, while Actinobacteriota abundance was higher in arid ones. The Beta-diversity analyses showed split microbiomes between arid and humid microbiomes, and aridity and soil pH levels could explain it. These differences between humid and arid microbiomes were maintained in the common garden experiment, showing that it is possible to transplant in situ diversity to the greenhouse. We detected a total of 1009 bacterial genera; 199 exclusively associated with roots under arid conditions. With shotgun metagenomic sequencing of rhizospheres (N=6), we identified 2969 protein families in the squash core metagenome and found an increased number of exclusively protein families from arid (924) than humid samples (158). We found arid conditions enriched genes involved in protein degradation and folding, oxidative stress, compatible solute synthesis, and ion pumps associated with osmotic regulation. Plant phenotyping allowed us to correlate bacterial communities with plant growth. Our study revealed that it is possible to evaluate microbiome diversity ex-situ and identify critical species and genes involved in plant-microbe interactions in historically arid locations.
We present local biplots, a an extension of the classic principal components biplot to multi-dimensional scaling. Noticing that principal components biplots have an interpretation as the Jacobian of a map from data space to the principal subspace, we define local biplots as the Jacobian of the analogous map for multi-dimensional scaling. In the process, we show a close relationship between our local biplot axes, generalized Euclidean distances, and generalized principal components. In simulations and real data we show how local biplots can shed light on what variables or combinations of variables are important for the low-dimensional embedding provided by multi-dimensional scaling. They give particular insight into a class of phylogenetically-informed distances commonly used in the analysis of microbiome data, showing that different variants of these distances can be interpreted as implicitly smoothing the data along the phylogenetic tree and that the extent of this smoothing is variable.