ﻻ يوجد ملخص باللغة العربية
In microbiome studies, one of the ways of studying bacterial abundances is to estimate bacterial composition based on the sequencing read counts. Various transformations are then applied to such compositional data for downstream statistical analysis, among which the centered log-ratio (clr) transformation is most commonly used. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which makes clr transformation infeasible. This paper proposes a multi-sample approach to estimation of the clr matrix directly in order to borrow information across samples and across species. Empirical results from real datasets suggest that the clr matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient is developed. Theoretical upper bounds of the estimation errors and of its corresponding singular subspace errors are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is analyzed on Gut Microbiome dataset and the American Gut project.
Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where the bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacteri
Shape-constrained density estimation is an important topic in mathematical statistics. We focus on densities on $mathbb{R}^d$ that are log-concave, and we study geometric properties of the maximum likelihood estimator (MLE) for weighted samples. Cule
We present local biplots, a an extension of the classic principal components biplot to multi-dimensional scaling. Noticing that principal components biplots have an interpretation as the Jacobian of a map from data space to the principal subspace, we
The odds ratio (OR) is a widely used measure of the effect size in observational research. ORs reflect statistical association between a binary outcome, such as the presence of a health condition, and a binary predictor, such as an exposure to a poll
Let X_1, ..., X_n be independent and identically distributed random vectors with a log-concave (Lebesgue) density f. We first prove that, with probability one, there exists a unique maximum likelihood estimator of f. The use of this estimator is attr