A novel sandwich algorithm for empirical Bayes analysis of rank data

57 0 0.0 ( 0 )

Download Cite

Added by Vivekananda Roy

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Arnab Kumar Laha - Somak Dutta - Vivekananda Roy

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Rank data arises frequently in marketing, finance, organizational behavior, and psychology. Most analysis of rank data reported in the literature assumes the presence of one or more variables (sometimes latent) based on whose values the items are ranked. In this paper we analyze rank data using a purely probabilistic model where the observed ranks are assumed to be perturbe

rate research

Nonparametric empirical Bayes and maximum likelihood estimation for high-dimensional data analysis

319 - Lee H. Dicker , Sihai D. Zhao 2014

Nonparametric empirical Bayes methods provide a flexible and attractive approach to high-dimensional data analysis. One particularly elegant empirical Bayes methodology, involving the Kiefer-Wolfowitz nonparametric maximum likelihood estimator (NPMLE) for mixture models, has been known for decades. However, implementation and theoretical analysis of the Kiefer-Wolfowitz NPMLE are notoriously difficult. A fast algorithm was recently proposed that makes NPMLE-based procedures feasible for use in large-scale problems, but the algorithm calculates only an approximation to the NPMLE. In this paper we make two contributions. First, we provide upper bounds on the convergence rate of the approximate NPMLEs statistical error, which have the same order as the best known bounds for the true NPMLE. This suggests that the approximate NPMLE is just as effective as the true NPMLE for statistical applications. Second, we illustrate the promise of NPMLE procedures in a high-dimensional binary classification problem. We propose a new procedure and show that it vastly outperforms existing methods in experiments with simulated data. In real data analyses involving cancer survival and gene expression data, we show that it is very competitive with several recently proposed methods for regularized linear discriminant analysis, another popular approach to high-dimensional classification.

Methodology

Nonparametric Empirical Bayes Estimation on Heterogeneous Data

64 - Luella J. Fu , Gareth M. James , Wenguang Sun 2020

The simultaneous estimation of many parameters $eta_i$, based on a corresponding set of observations $x_i$, for $i=1,ldots, n$, is a key research problem that has received renewed attention in the high-dimensional setting. %The classic example involves estimating a vector of normal means $mu_i$ subject to a fixed variance term $sigma^2$. However, Many practical situations involve heterogeneous data $(x_i, theta_i)$ where $theta_i$ is a known nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the Nonparametric Empirical Bayes Smoothing Tweedie (NEST) estimator, which efficiently estimates $eta_i$ and properly adjusts for heterogeneity %by approximating the marginal density of the data $f_{theta_i}(x_i)$ and applying this density to via a generalized version of Tweedies formula. NEST is capable of handling a wider range of settings than previously proposed heterogeneous approaches as it does not make any parametric assumptions on the prior distribution of $eta_i$. The estimation framework is simple but general enough to accommodate any member of the exponential family of distributions. %; a thorough study of the normal means problem subject to heterogeneous variances is presented to illustrate the proposed framework. Our theoretical results show that NEST is asymptotically optimal, while simulation studies show that it outperforms competing methods, with substantial efficiency gains in many settings. The method is demonstrated on a data set measuring the performance gap in math scores between socioeconomically advantaged and disadvantaged students in K-12 schools.

Methodology

Revisiting Empirical Bayes Methods and Applications to Special Types of Data

120 - Xiuwen Duan 2021

Empirical Bayes methods have been around for a long time and have a wide range of applications. These methods provide a way in which historical data can be aggregated to provide estimates of the posterior mean. This thesis revisits some of the empirical Bayesian methods and develops new applications. We first look at a linear empirical Bayes estimator and apply it on ranking and symbolic data. Next, we consider Tweedies formula and show how it can be applied to analyze a microarray dataset. The application of the formula is simplified with the Pearson system of distributions. Saddlepoint approximations enable us to generalize several results in this direction. The results show that the proposed methods perform well in applications to real data sets.

Methodology

Low-rank model with covariates for count data analysis

74 - Genevi`eve Robin , Julie Josse (CMAP 2017

Count data are collected in many scientific and engineering tasks including image processing, single-cell RNA sequencing and ecological studies. Such data sets often contain missing values, for example because some ecological sites cannot be reached in a certain year. In addition, in many instances, side information is also available, for example covariates about ecological sites or species. Low-rank methods are popular to denoise and impute count data, and benefit from a substantial theoretical background. Extensions accounting for covariates have been proposed, but to the best of our knowledge their theoretical and empirical properties have not been thoroughly studied, and few softwares are available for practitioners. We propose a complete methodology called LORI (Low-Rank Interaction), including a Poisson model, an algorithm, and automatic selection of the regularization parameter, to analyze count tables with covariates. We also derive an upper bound on the estimation error. We provide a simulation study with synthetic data, revealing empirically that LORI improves on state of the art methods in terms of estimation and imputation of the missing values. We illustrate how the method can be interpreted through visual displays with the analysis of a well-know plant abundance data set, and show that the LORI outputs are consistent with known results. Finally we demonstrate the relevance of the methodology by analyzing a water-birds abundance table from the French national agency for wildlife and hunting management (ONCFS). The method is available in the R package lori on the Comprehensive Archive Network (CRAN).

Methodology

Empirical and Constrained Empirical Bayes Variance Estimation Under A One Unit Per Stratum Sample Design

137 - Sepideh Mosaferi 2019

A single primary sampling unit (PSU) per stratum design is a popular design for estimating the parameter of interest. Although, the point estimator of the design is unbiased and efficient, an unbiased variance estimator does not exist. A common practice to solve this is to collapse or combine the two adjacent strata, but the attained estimator of variance is not design-unbiased, and the bias increases as the population means of the collapsed strata become more variant. Therefore, the one PSU per stratum design with collapsed stratum variance estimator might not be a good choice, and some statisticians prefer a design in which two PSUs per stratum are selected. In this paper, we first compare a one PSU per stratum design to a two PSUs per stratum design. Then, we propose an empirical Bayes estimator for the variance of one PSU per stratum design, where it over-shrinks towards the prior mean. To protect against this, we investigate the potential of a constrained empirical Bayes estimator. Through a simulation study, we show that the empirical Bayes and constrained empirical Bayes estimators outperform the classical collapsed one in terms of empirical relative mean squared error.

Methodology