Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Testing for differential abundance in compositional counts data, with application to microbiome studies

تجريب للتفاوت الكمي في بيانات العدد التركيبي، مع تطبيقات لدراسات الميكروبيوم

626 0 0.0 ( 0 )

Download Cite

Added by Barak Brill

Publication date 2019

fields Biology Mathematical Statistics

and research's language is English

Authors Barak Brill - Amnon Amir - Ruth Heller

Genomics Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

تحديد الضرائب التي ترتبط بالخلايا البكتيرية الخاصة بنا مع الصفات المطلوبة هو مهم لتقدم العلوم والصحة. ومع ذلك، التحديد من المشكلة لأن الاتجاه المقاس لعدد الضرائب (بواسطة ترديد الأمبليكون) هو التركيبي، لذلك تحدث تغيير في كثافة ضريبة واحدة في البكتيريا تحديثا في عدد الترديد المسجل عبر جميع الضرائب. وتكون البيانات عادة ما تكون قليلة التوازن، مع عدد الصفرات الموجودة إما بسبب الاختلاف البيولوجي أو عمق الترديد الفني (الصفرات الفنية). للضرائب الكثافة المنخفضة، فإن الفرصة للصفرات الفنية غير محذوفة. نظرا لأن الطرق الموجودة المصممة لتحديد الكثافة المختلفة للبيانات التركيبية قد تتضمن عددا مرتفعا من الأشياء الخاطئة بسبب التعامل غير الصحيح مع الصفرات. نقدم طريقة جديدة غير باراميترية والتي توفر التأكيد الصحيح حتى عندما يكون نسبة الصفرات كبيرة. يستخدم طريقتنا مجموعة من الضرائب المرجعية التي لا تختلف في الكثافة، والتي يمكن تقديرها من البيانات أو من المعلومات الخارجية. نظرا لأننا نظهر أن الطريقة الجديدة هي مفيدة عبر المحاكاة، فضلا عن على ثلاث مجموعات بيانات مختلفة: دراسة مرض كرون، مشروع البكتيريا البشرية، وتجربة مع البكتيريا المحشوة.

Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositional, so a change in the abundance of one taxon in the microbiota induces a change in the number of sequenced counts across all taxa. The data is typically sparse, with zero counts present either due to biological variance or limited sequencing depth (technical zeros). For low abundance taxa, the chance for technical zeros is non-negligible. We show that existing methods designed to identify differential abundance for compositional data may have an inflated number of false positives due to improper handling of the zero counts. We introduce a novel non-parametric approach which provides valid inference even when the fraction of zero counts is substantial. Our approach uses a set of reference taxa that are non-differentially abundant, which can be estimated from the data or from outside information. We show the usefulness of our approach via simulations, as well as on three different data sets: a Crohns disease study, the Human Microbiome Project, and an experiment with spiked-in bacteria.

rate research

Microbiome compositional analysis with logistic-tree normal models

93 - Zhuoqun Wang , Jialiang Mao , 2021

Modern microbiome compositional data are often high-dimensional and exhibit complex dependency among microbial taxa. However, existing approaches to analyzing microbiome compositional data either do not adequately account for the complex dependency or lack scalability to high-dimensionality, which presents challenges in appropriately incorporating the random effects in microbiome compositions in the resulting statistical analysis. We introduce a generative model called the logistic-tree normal (LTN) model to address this need. The LTN marries two popular classes of models -- the log-ratio normal (LN) and the Dirichlet-tree (DT) -- and inherits key benefits of each. LN models are flexible in characterizing covariance among taxa but lacks scalability to higher dimensions; DT avoids this issue through a tree-based binomial decomposition but incurs restrictive covariance. The LTN incorporates the tree-based decomposition as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) logistic-normal distribution as in LN models. It therefore allows rich covariance structures as LN, along with computational efficiency realized through a Polya-Gamma augmentation on the binomial models at the tree nodes. Accordingly, Bayesian inference on LTN can readily proceed by Gibbs sampling. The LTN also allows common techniques for effective inference on high-dimensional data -- such as those based on sparsity and low-rank assumptions in the covariance structure -- to be readily incorporated. Depending on the goal of the analysis, LTN can be used either as a standalone model or embedded into more sophisticated hierarchical models. We demonstrate its use in estimating taxa covariance and in mixed-effects modeling. Finally, we carry out an extensive case study using an LTN-based mixed-effects model to analyze a longitudinal dataset from the DIABIMMUNE project.

Methodology Applications

Robust Differential Abundance Test in Compositional Data

73 - Shulei Wang 2021

Differential abundance tests in compositional data are essential and fundamental tasks in various biomedical applications, such as single-cell, bulk RNA-seq, and microbiome data analysis. However, despite the recent developments in these fields, differential abundance analysis in compositional data remains a complicated and unsolved statistical problem, because of the compositional constraint and prevalent zero counts in the dataset. This study introduces a new differential abundance test, the robust differential abundance (RDB) test, to address these challenges. Compared with existing methods, the RDB test 1) is simple and computationally efficient, 2) is robust to prevalent zero counts in compositional datasets, 3) can take the datas compositional nature into account, and 4) has a theoretical guarantee of controlling false discoveries in a general setting. Furthermore, in the presence of observed covariates, the RDB test can work with the covariate balancing techniques to remove the potential confounding effects and draw reliable conclusions. Finally, we apply the new test to several numerical examples using simulated and real datasets to demonstrate its practical merits.

Methodology Statistics Theory Quantitative Methods

A Test for Differential Ascertainment in Case-Control Studies with Application to Child Maltreatment

77 - Matteo Sordello , Dylan S. Small 2019

We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. We can alleviate the effect of such bias using the estimates that our method of testing for differential ascertainment naturally provides. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.

Methodology Applications

Regularization Strategies for Hyperplane Classifiers: Application to Cancer Classification with Gene Expression Data

89 - Erik Andries 2006

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

Genomics

Statistical computation methods for microbiome compositional data network inference

98 - Liang Chen , Qiuyan He , Hui Wan 2021

Microbes can affect processes from food production to human health. Such microbes are not isolated, but rather interact with each other and establish connections with their living environments. Understanding these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A common and essential approach toward this objective involves the inference of microbiome interaction networks. Although network inference methods in other fields have been studied before, applying these methods to estimate microbiome associations based on compositional data will not yield valid results. On the one hand, features of microbiome data such as compositionality, sparsity and high-dimensionality challenge the data normalization and the design of computational methods. On the other hand, several issues like microbial community heterogeneity, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this paper, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various assumptions and research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks and differential networks. Their scope of applications, advantages and limitations are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has captured all the aspects of interest to date. In addition, we discuss the challenges now confronting current microbial associations study and future prospects. Finally, we highlight that the research in microbial network inference requires the joint promotion of statistical computation methods and experimental techniques.

Applications

comments

Fetching comments

Helwan

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Testing for differential abundance in compositional counts data, with application to microbiome studies

تجريب للتفاوت الكمي في بيانات العدد التركيبي، مع تطبيقات لدراسات الميكروبيوم

Ask ChatGPT about the research

Read More