Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Detection of outlying proportions

59 0 0.0 ( 0 )

Download Cite

Added by Fabio Rapallo

Publication date 2016

fields Mathematical Statistics

and research's language is English

Authors Flavio Mignone - Fabio Rapallo

Methodology Applications Computation

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we introduce a new method for detecting outliers in a set of proportions. It is based on the construction of a suitable two-way contingency table and on the application of an algorithm for the detection of outlying cells in such table. We exploit the special structure of the relevant contingency table to increase the efficiency of the method. The main properties of our algorithm, together with a guide for the choice of the parameters, are investigated through simulations, and in simple cases some theoretical justifications are provided. Several examples on synthetic data and an example based on pseudo-real data from biological experiments demonstrate the good performances of our algorithm.

rate research

Multi-sample Estimation of Bacterial Composition Matrix in Metagenomics Data

78 - Yuanpei Cao , Anru Zhang , Hongzhe Li 2017

Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where the bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which tend to result in inaccurate estimates of bacterial abundance and diversity. This paper takes a multi-sample approach to the estimation of bacterial abundances in order to borrow information across samples and across species. Empirical results from real data sets suggest that the composition matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient and Euclidean projection onto simplex space is developed. The theoretical upper bounds and the minimax lower bounds of the estimation errors, measured by the Kullback-Leibler divergence and the Frobenius norm, are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is applied to an analysis of a human gut microbiome dataset.

Methodology Applications Computation

Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings

130 - Zeda Li , Scott A. Bruce , 2021

This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or spectral envelope, obtained by assigning numerical values, or scalings, to categories that optimally emphasize oscillations at each frequency. Our procedure combines these two quantities to produce an interpretable and parsimonious feature-based classifier that can be used to accurately determine group membership for categorical time series. Classification consistency of the proposed method is investigated, and simulation studies are used to demonstrate accuracy in classifying categorical time series with various underlying group structures. Finally, we use the proposed method to explore key differences in oscillatory patterns of sleep stage time series for patients with different sleep disorders and accurately classify patients accordingly.

Methodology Applications Computation

Distributions associated with general runs and patterns in hidden Markov models

332 - John A. D. Aston , Donald E. K. Martin 2007

This paper gives a method for computing distributions associated with patterns in the state sequence of a hidden Markov model, conditional on observing all or part of the observation sequence. Probabilities are computed for very general classes of patterns (competing patterns and generalized later patterns), and thus, the theory includes as special cases results for a large class of problems that have wide application. The unobserved state sequence is assumed to be Markovian with a general order of dependence. An auxiliary Markov chain is associated with the state sequence and is used to simplify the computations. Two examples are given to illustrate the use of the methodology. Whereas the first application is more to illustrate the basic steps in applying the theory, the second is a more detailed application to DNA sequences, and shows that the methods can be adapted to include restrictions related to biological knowledge.

Methodology Applications Computation

A penalized simulated maximum likelihood approach in parameter estimation for stochastic differential equations

690 - Libo Sun , Chihoon Lee , 2013

We consider the problem of estimating parameters of stochastic differential equations (SDEs) with discrete-time observations that are either completely or partially observed. The transition density between two observations is generally unknown. We propose an importance sampling approach with an auxiliary parameter when the transition density is unknown. We embed the auxiliary importance sampler in a penalized maximum likelihood framework which produces more accurate and computationally efficient parameter estimates. Simulation studies in three different models illustrate promising improvements of the new penalized simulated maximum likelihood method. The new procedure is designed for the challenging case when some state variables are unobserved and moreover, observed states are sparse over time, which commonly arises in ecological studies. We apply this new approach to two epidemics of chronic wasting disease in mule deer.

Methodology Applications Computation

Posterior-based proposals for speeding up Markov chain Monte Carlo

69 - C. M. Pooley , S. C. Bishop , A. Doeschl-Wilson 2019

Markov chain Monte Carlo (MCMC) is widely used for Bayesian inference in models of complex systems. Performance, however, is often unsatisfactory in models with many latent variables due to so-called poor mixing, necessitating development of application specific implementations. This paper introduces posterior-based proposals (PBPs), a new type of MCMC update applicable to a huge class of statistical models (whose conditional dependence structures are represented by directed acyclic graphs). PBPs generates large joint updates in parameter and latent variable space, whilst retaining good acceptance rates (typically 33%). Evaluation against other approaches (from standard Gibbs / random walk updates to state-of-the-art Hamiltonian and particle MCMC methods) was carried out for widely varying model types: an individual-based model for disease diagnostic test data, a financial stochastic volatility model, a mixed model used in statistical genetics and a population model used in ecology. Whilst different methods worked better or worse in different scenarios, PBPs were found to be either near to the fastest or significantly faster than the next best approach (by up to a factor of 10). PBPs therefore represent an additional general purpose technique that can be usefully applied in a wide variety of contexts.

Methodology Applications Computation

comments

Fetching comments

Al-Etihad University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Detection of outlying proportions

Ask ChatGPT about the research

No Arabic abstract

Read More