ﻻ يوجد ملخص باللغة العربية
Modern microbiome compositional data are often high-dimensional and exhibit complex dependency among microbial taxa. However, existing approaches to analyzing microbiome compositional data either do not adequately account for the complex dependency or lack scalability to high-dimensionality, which presents challenges in appropriately incorporating the random effects in microbiome compositions in the resulting statistical analysis. We introduce a generative model called the logistic-tree normal (LTN) model to address this need. The LTN marries two popular classes of models -- the log-ratio normal (LN) and the Dirichlet-tree (DT) -- and inherits key benefits of each. LN models are flexible in characterizing covariance among taxa but lacks scalability to higher dimensions; DT avoids this issue through a tree-based binomial decomposition but incurs restrictive covariance. The LTN incorporates the tree-based decomposition as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) logistic-normal distribution as in LN models. It therefore allows rich covariance structures as LN, along with computational efficiency realized through a Polya-Gamma augmentation on the binomial models at the tree nodes. Accordingly, Bayesian inference on LTN can readily proceed by Gibbs sampling. The LTN also allows common techniques for effective inference on high-dimensional data -- such as those based on sparsity and low-rank assumptions in the covariance structure -- to be readily incorporated. Depending on the goal of the analysis, LTN can be used either as a standalone model or embedded into more sophisticated hierarchical models. We demonstrate its use in estimating taxa covariance and in mixed-effects modeling. Finally, we carry out an extensive case study using an LTN-based mixed-effects model to analyze a longitudinal dataset from the DIABIMMUNE project.
Mixed-membership (MM) models such as Latent Dirichlet Allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. However, microbiome compositional data, especially those collected from
Partial correlations quantify linear association between two variables adjusting for the influence of the remaining variables. They form the backbone for graphical models and are readily obtained from the inverse of the covariance matrix. For composi
Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is con
Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositi
Spatial processes with nonstationary and anisotropic covariance structure are often used when modelling, analysing and predicting complex environmental phenomena. Such processes may often be expressed as ones that have stationary and isotropic covari