بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Multi-sample Estimation of Bacterial Composition Matrix in Metagenomics Data

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Anru Zhang

تاريخ النشر 2017

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Yuanpei Cao - Anru Zhang - Hongzhe Li

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where the bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which tend to result in inaccurate estimates of bacterial abundance and diversity. This paper takes a multi-sample approach to the estimation of bacterial abundances in order to borrow information across samples and across species. Empirical results from real data sets suggest that the composition matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient and Euclidean projection onto simplex space is developed. The theoretical upper bounds and the minimax lower bounds of the estimation errors, measured by the Kullback-Leibler divergence and the Frobenius norm, are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is applied to an analysis of a human gut microbiome dataset.

قيم البحث

78 - Brian Kidd , Matthias Katzfuss 2020

In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian sp atial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the number of spatial locations) entries of the covariance matrix, the idea is to infer a near-linear number of nonzero entries in a sparse Cholesky factor of the precision matrix. Our prior assumptions are motivated by recent results on the exponential decay of the entries of this Cholesky factor for Matern-type covariances under a specific ordering scheme. Our methods are highly scalable and parallelizable. We conduct numerical comparisons and apply our methodology to climate-model output, enabling statistical emulation of an expensive physical model.

المنهجية تطبيقات الإحصاء حساب

Planning SMARTs: Sample size estimation for comparing dynamic treatment regimens using longitudinal count outcomes with excess zeros

78 - Jamie Yap , John Dziak , Raju Maiti 2021

In many health domains such as substance-use, outcomes are often counts with an excessive number of zeros (EZ) - count data having zero counts at a rate significantly higher than that expected of a standard count distribution (e.g., Poisson). However , an important gap exists in sample size estimation methodology for planning sequential multiple assignment randomized trials (SMARTs) for comparing dynamic treatment regimens (DTRs) using longitudinal count data. DTRs, also known as treatment algorithms or adaptive interventions, mimic the individualized and evolving nature of patient care through the specification of decision rules guiding the type, timing and modality of delivery, and dosage of treatments to address the unique and changing needs of individuals. To close this gap, we develop a Monte Carlo-based approach to sample size estimation. A SMART for engaging alcohol and cocaine-dependent patients in treatment is used as motivation.

المنهجية تطبيقات الإحصاء حساب

Multi-sample estimation of centered log-ratio matrix in microbiome studies

108 - Yezheng Li , Hongzhe Li , Yuanpei Cao 2021

In microbiome studies, one of the ways of studying bacterial abundances is to estimate bacterial composition based on the sequencing read counts. Various transformations are then applied to such compositional data for downstream statistical analysis, among which the centered log-ratio (clr) transformation is most commonly used. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which makes clr transformation infeasible. This paper proposes a multi-sample approach to estimation of the clr matrix directly in order to borrow information across samples and across species. Empirical results from real datasets suggest that the clr matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient is developed. Theoretical upper bounds of the estimation errors and of its corresponding singular subspace errors are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is analyzed on Gut Microbiome dataset and the American Gut project.

المنهجية

A penalized simulated maximum likelihood approach in parameter estimation for stochastic differential equations

744 - Libo Sun , Chihoon Lee , 2013

We consider the problem of estimating parameters of stochastic differential equations (SDEs) with discrete-time observations that are either completely or partially observed. The transition density between two observations is generally unknown. We pr opose an importance sampling approach with an auxiliary parameter when the transition density is unknown. We embed the auxiliary importance sampler in a penalized maximum likelihood framework which produces more accurate and computationally efficient parameter estimates. Simulation studies in three different models illustrate promising improvements of the new penalized simulated maximum likelihood method. The new procedure is designed for the challenging case when some state variables are unobserved and moreover, observed states are sparse over time, which commonly arises in ecological studies. We apply this new approach to two epidemics of chronic wasting disease in mule deer.

المنهجية تطبيقات الإحصاء حساب

Distributed sequential method for analyzing massive data

109 - Zhanfeng Wang , Yuan-chin Ivan Chang 2018

To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly integrate their results while maintaining the desired statistical properties. Additionally, using a criterion from the statistical experiment design, we adopt an adaptive sample selection, together with an adaptive shrinkage estimation method, to simultaneously accelerate the estimation procedure and identify the effective variables. We confirm the cogency of our methods through theoretical justifications and numerical results derived from synthesized data sets. We then apply the proposed method to three real data sets, including those pertaining to appliance energy use and particulate matter concentration.

المنهجية تطبيقات الإحصاء حساب

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلوان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multi-sample Estimation of Bacterial Composition Matrix in Metagenomics Data

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً