ترغب بنشر مسار تعليمي؟ اضغط هنا

Microbiome compositional analysis with logistic-tree normal models

94   0   0.0 ( 0 )
 نشر من قبل Zhuoqun Wang
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Modern microbiome compositional data are often high-dimensional and exhibit complex dependency among microbial taxa. However, existing approaches to analyzing microbiome compositional data either do not adequately account for the complex dependency or lack scalability to high-dimensionality, which presents challenges in appropriately incorporating the random effects in microbiome compositions in the resulting statistical analysis. We introduce a generative model called the logistic-tree normal (LTN) model to address this need. The LTN marries two popular classes of models -- the log-ratio normal (LN) and the Dirichlet-tree (DT) -- and inherits key benefits of each. LN models are flexible in characterizing covariance among taxa but lacks scalability to higher dimensions; DT avoids this issue through a tree-based binomial decomposition but incurs restrictive covariance. The LTN incorporates the tree-based decomposition as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) logistic-normal distribution as in LN models. It therefore allows rich covariance structures as LN, along with computational efficiency realized through a Polya-Gamma augmentation on the binomial models at the tree nodes. Accordingly, Bayesian inference on LTN can readily proceed by Gibbs sampling. The LTN also allows common techniques for effective inference on high-dimensional data -- such as those based on sparsity and low-rank assumptions in the covariance structure -- to be readily incorporated. Depending on the goal of the analysis, LTN can be used either as a standalone model or embedded into more sophisticated hierarchical models. We demonstrate its use in estimating taxa covariance and in mixed-effects modeling. Finally, we carry out an extensive case study using an LTN-based mixed-effects model to analyze a longitudinal dataset from the DIABIMMUNE project.



قيم البحث

اقرأ أيضاً

79 - Patrick LeBlanc , Li Ma 2021
Mixed-membership (MM) models such as Latent Dirichlet Allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. However, microbiome compositional data, especially those collected from the gut, typically display substantial cross-sample heterogeneities in the subcommunity composition which current MM methods do not account for. To address this limitation, we incorporate the logistic-tree normal (LTN) model -- using the phylogenetic tree structure -- into the LDA model to form a new MM model. This model allows variation in the composition of each subcommunity around some ``centroid composition. Incorporation of auxiliary Polya-Gamma variables enables a computationally efficient collapsed blocked Gibbs sampler to carry out Bayesian inference under this model. We compare the new model and LDA and show that in the presence of large cross-sample heterogeneity, under the LDA model the resulting inference can be extremely sensitive to the specification of the total number of subcommunities as it does not account for cross-sample heterogeneity. As such, the popular strategy in other applications of MM models of overspecifying the number of subcommunities -- and hoping that some meaningful subcommunities will emerge among artificial ones -- can lead to highly misleading conclusions in the microbiome context. In contrast, by accounting for such heterogeneity, our MM model restores the robustness of the inference in the specification of the number of subcommunities and again allows meaningful subcommunities to be identified under this strategy.
79 - Ionas Erb 2019
Partial correlations quantify linear association between two variables adjusting for the influence of the remaining variables. They form the backbone for graphical models and are readily obtained from the inverse of the covariance matrix. For composi tional data, the covariance structure is specified from log ratios of variables, so unless we try to open the data via a normalization, this implies changes in the definition and interpretation of partial correlations. In the present work, we elucidate how results derived by Aitchison (1986) lead to a natural definition of partial correlation that has a number of advantages over current measures of association. For this, we show that the residuals of log-ratios between a variable with a reference, when adjusting for all remaining variables including the reference, are reference-independent. Since the reference itself can be controlled for, correlations between residuals are defined for the variables directly without the necessity to recur to ratios except when specifying which variables are partialled out. Thus, perhaps surprisingly, partial correlations do not have the problems commonly found with measures of pairwise association on compositional data. They are well-defined between two variables, are properly scaled, and allow for negative association. By design, they are subcompositionally incoherent, but they share this property with conventional partial correlations (where results change when adjusting for the influence of fewer variables). We discuss the equivalence with normalization-based approaches whenever the normalizing variables are controlled for. We also discuss the partial variances and correlations we obtain from a previously studied data set of Roman glass cups.
Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is con siderable interest in modeling interactions among such relative proportions. To this end we propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex. Special cases include the family of Dirichlet distributions as well as Aitchisons additive logistic normal distributions. Generally, the distributions we consider have a density that features a difficult to compute normalizing constant. To circumvent this issue, we design effective estimation methods based on generaliz
Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositi onal, so a change in the abundance of one taxon in the microbiota induces a change in the number of sequenced counts across all taxa. The data is typically sparse, with zero counts present either due to biological variance or limited sequencing depth (technical zeros). For low abundance taxa, the chance for technical zeros is non-negligible. We show that existing methods designed to identify differential abundance for compositional data may have an inflated number of false positives due to improper handling of the zero counts. We introduce a novel non-parametric approach which provides valid inference even when the fraction of zero counts is substantial. Our approach uses a set of reference taxa that are non-differentially abundant, which can be estimated from the data or from outside information. We show the usefulness of our approach via simulations, as well as on three different data sets: a Crohns disease study, the Human Microbiome Project, and an experiment with spiked-in bacteria.
Spatial processes with nonstationary and anisotropic covariance structure are often used when modelling, analysing and predicting complex environmental phenomena. Such processes may often be expressed as ones that have stationary and isotropic covari ance structure on a warped spatial domain. However, the warping function is generally difficult to fit and not constrained to be injective, often resulting in `space-folding. Here, we propose modelling an injective warping function through a composition of multiple elemental injective functions in a deep-learning framework. We consider two cases; first, when these functions are known up to some weights that need to be estimated, and, second, when the weights in each layer are random. Inspired by recent methodological and technological advances in deep learning and deep Gaussian processes, we employ approximate Bayesian methods to make inference with these models using graphics processing units. Through simulation studies in one and two dimensions we show that the deep compositional spatial models are quick to fit, and are able to provide better predictions and uncertainty quantification than other deep stochastic models of similar complexity. We also show their remarkable capacity to model nonstationary, anisotropic spatial data using radiances from the MODIS instrument aboard the Aqua satellite.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا