ترغب بنشر مسار تعليمي؟ اضغط هنا

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation

80   0   0.0 ( 0 )
 نشر من قبل Patrick LeBlanc
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Mixed-membership (MM) models such as Latent Dirichlet Allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. However, microbiome compositional data, especially those collected from the gut, typically display substantial cross-sample heterogeneities in the subcommunity composition which current MM methods do not account for. To address this limitation, we incorporate the logistic-tree normal (LTN) model -- using the phylogenetic tree structure -- into the LDA model to form a new MM model. This model allows variation in the composition of each subcommunity around some ``centroid composition. Incorporation of auxiliary Polya-Gamma variables enables a computationally efficient collapsed blocked Gibbs sampler to carry out Bayesian inference under this model. We compare the new model and LDA and show that in the presence of large cross-sample heterogeneity, under the LDA model the resulting inference can be extremely sensitive to the specification of the total number of subcommunities as it does not account for cross-sample heterogeneity. As such, the popular strategy in other applications of MM models of overspecifying the number of subcommunities -- and hoping that some meaningful subcommunities will emerge among artificial ones -- can lead to highly misleading conclusions in the microbiome context. In contrast, by accounting for such heterogeneity, our MM model restores the robustness of the inference in the specification of the number of subcommunities and again allows meaningful subcommunities to be identified under this strategy.



قيم البحث

اقرأ أيضاً

93 - Zhuoqun Wang , Jialiang Mao , 2021
Modern microbiome compositional data are often high-dimensional and exhibit complex dependency among microbial taxa. However, existing approaches to analyzing microbiome compositional data either do not adequately account for the complex dependency o r lack scalability to high-dimensionality, which presents challenges in appropriately incorporating the random effects in microbiome compositions in the resulting statistical analysis. We introduce a generative model called the logistic-tree normal (LTN) model to address this need. The LTN marries two popular classes of models -- the log-ratio normal (LN) and the Dirichlet-tree (DT) -- and inherits key benefits of each. LN models are flexible in characterizing covariance among taxa but lacks scalability to higher dimensions; DT avoids this issue through a tree-based binomial decomposition but incurs restrictive covariance. The LTN incorporates the tree-based decomposition as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) logistic-normal distribution as in LN models. It therefore allows rich covariance structures as LN, along with computational efficiency realized through a Polya-Gamma augmentation on the binomial models at the tree nodes. Accordingly, Bayesian inference on LTN can readily proceed by Gibbs sampling. The LTN also allows common techniques for effective inference on high-dimensional data -- such as those based on sparsity and low-rank assumptions in the covariance structure -- to be readily incorporated. Depending on the goal of the analysis, LTN can be used either as a standalone model or embedded into more sophisticated hierarchical models. We demonstrate its use in estimating taxa covariance and in mixed-effects modeling. Finally, we carry out an extensive case study using an LTN-based mixed-effects model to analyze a longitudinal dataset from the DIABIMMUNE project.
474 - Hartmut Maennel 2020
Assume we have potential causes $zin Z$, which produce events $w$ with known probabilities $beta(w|z)$. We observe $w_1,w_2,...,w_n$, what can we say about the distribution of the causes? A Bayesian estimate will assume a prior on distributions on $Z $ (we assume a Dirichlet prior) and calculate a posterior. An average over that posterior then gives a distribution on $Z$, which estimates how much each cause $z$ contributed to our observations. This is the setting of Latent Dirichlet Allocation, which can be applied e.g. to topics producing words in a document. In this setting usually the number of observed words is large, but the number of potential topics is small. We are here interested in applications with many potential causes (e.g. locations on the globe), but only a few observations. We show that the exact Bayesian estimate can be computed in linear time (and constant space) in $|Z|$ for a given upper bound on $n$ with a surprisingly simple formula. We generalize this algorithm to the case of sparse probabilities $beta(w|z)$, in which we only need to assume that the tree width of an interaction graph on the observations is limited. On the other hand we also show that without such limitation the problem is NP-hard.
81 - Piotr Zwiernik 2017
Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent cl ass model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. This article offers a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned.
Recent work has explored transforming data sets into smaller, approximate summaries in order to scale Bayesian inference. We examine a related problem in which the parameters of a Bayesian model are very large and expensive to store in memory, and pr opose more compact representations of parameter values that can be used during inference. We focus on a class of graphical models that we refer to as latent Dirichlet-Categorical models, and show how a combination of two sketching algorithms known as count-min sketch and approximate counters provide an efficient representation for them. We show that this sketch combination -- which, despite having been used before in NLP applications, has not been previously analyzed -- enjoys desirable properties. We prove that for this class of models, when the sketches are used during Markov Chain Monte Carlo inference, the equilibrium of sketched MCMC converges to that of the exact chain as sketch parameters are tuned to reduce the error rate.
We provide an end-to-end differentially private spectral algorithm for learning LDA, based on matrix/tensor decompositions, and establish theoretical guarantees on utility/consistency of the estimated model parameters. The spectral algorithm consists of multiple algorithmic steps, named as {edges}, to which noise could be injected to obtain differential privacy. We identify emph{subsets of edges}, named as {configurations}, such that adding noise to all edges in such a subset guarantees differential privacy of the end-to-end spectral algorithm. We characterize the sensitivity of the edges with respect to the input and thus estimate the amount of noise to be added to each edge for any required privacy level. We then characterize the utility loss for each configuration as a function of injected noise. Overall, by combining the sensitivity and utility characterization, we obtain an end-to-end differentially private spectral algorithm for LDA and identify the corresponding configuration that outperforms others in any specific regime. We are the first to achieve utility guarantees under the required level of differential privacy for learning in LDA. Overall our method systematically outperforms differentially private variational inference.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا