A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters

373 0 0.0 ( 0 )

Download Cite

Added by Wagner Barreto-Souza

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Fernando de S. Bastos - Wagner Barreto-Souza - Marc G. Genton

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency and asymptotic normality of the MLEs. Proper score residuals for sample selection models are provided, and model adequacy is addressed. Simulated results are presented to check the finite-sample behavior of the estimators and to verify the consequences of not considering varying sample selection bias and dispersion parameters. We show that the normal assumption for analyzing medical expenditure data is suitable and that the conclusions drawn using our approach are coherent with findings from prior literature. Moreover, we identify which covariates are relevant to explain the presence of sample selection bias in this important dataset.

rate research

A generalized EMS algorithm for model selection with incomplete data

327 - Ping-Feng Xu , Lai-Xu Shang , Man-Lai Tang 2021

Recently, a so-called E-MS algorithm was developed for model selection in the presence of missing data. Specifically, it performs the Expectation step (E step) and Model Selection step (MS step) alternately to find the minimum point of the observed generalized information criteria (GIC). In practice, it could be numerically infeasible to perform the MS-step for high dimensional settings. In this paper, we propose a more simple and feasible generalized EMS (GEMS) algorithm which simply requires a decrease in the observed GIC in the MS-step and includes the original EMS algorithm as a special case. We obtain several numerical convergence results of the GEMS algorithm under mild conditions. We apply the proposed GEMS algorithm to Gaussian graphical model selection and variable selection in generalized linear models and compare it with existing competitors via numerical experiments. We illustrate its application with three real data sets.

Methodology

A Robust Bayesian Copas Selection Model for Quantifying and Correcting Publication Bias

97 - Ray Bai , Lifeng Lin , Mary R. Boland 2020

The validity of conclusions from meta-analysis is potentially threatened by publication bias. Most existing procedures for correcting publication bias assume normality of the study-specific effects that account for between-study heterogeneity. However, this assumption may not be valid, and the performance of these bias correction procedures can be highly sensitive to departures from normality. Further, there exist few measures to quantify the magnitude of publication bias based on selection models. In this paper, we address both of these issues. First, we explore the use of heavy-tailed distributions for the study-specific effects within a Bayesian hierarchical framework. The deviance information criterion (DIC) is used to determine the appropriate distribution to use for conducting the final analysis. Second, we develop a new measure to quantify the magnitude of publication bias based on Hellinger distance. Our measure is easy to interpret and takes advantage of the estimation uncertainty afforded naturally by the posterior distribution. We illustrate our proposed approach through simulation studies and meta-analyses on lung cancer and antidepressants. To assess the prevalence of publication bias, we apply our method to 1500 meta-analyses of dichotomous outcomes in the Cochrane Database of Systematic Reviews. Our methods are implemented in the publicly available R package RobustBayesianCopas.

Methodology

Testing for publication bias in meta-analysis under Copas selection model

74 - Rui Duan , Jin Piao , Arielle Marks-Anglin 2020

In meta-analyses, publication bias is a well-known, important and challenging issue because the validity of the results from a meta-analysis is threatened if the sample of studies retrieved for review is biased. One popular method to deal with publication bias is the Copas selection model, which provides a flexible sensitivity analysis for correcting the estimates with considerable insight into the data suppression mechanism. However, rigorous testing procedures under the Copas selection model to detect bias are lacking. To fill this gap, we develop a score-based test for detecting publication bias under the Copas selection model. We reveal that the behavior of the standard score test statistic is irregular because the parameters of the Copas selection model disappear under the null hypothesis, leading to an identifiability problem. We propose a novel test statistic and derive its limiting distribution. A bootstrap procedure is provided to obtain the p-value of the test for practical applications. We conduct extensive Monte Carlo simulations to evaluate the performance of the proposed test and apply the method to several existing meta-analyses.

Methodology

Magnification bias in galaxy surveys with complex sample selection functions

84 - Maximilian von Wietersheim-Kramsta , Benjamin Joachimi , Jan Luca vann den Busch 2021

Gravitational lensing magnification modifies the observed spatial distribution of galaxies and can severely bias cosmological probes of large-scale structure if not accurately modelled. Standard approaches to modelling this magnification bias may not be applicable in practice as many galaxy samples have complex, often implicit, selection functions. We propose and test a procedure to quantify the magnification bias induced in clustering and galaxy-galaxy lensing (GGL) signals in galaxy samples subject to a selection function beyond a simple flux limit. The method employs realistic mock data to calibrate an effective luminosity function slope, $alpha_{rm{obs}}$, from observed galaxy counts, which can then be used with the standard formalism. We demonstrate this method for two galaxy samples derived from the Baryon Oscillation Spectroscopic Survey (BOSS) in the redshift ranges $0.2 < z leq 0.5$ and $0.5 < z leq 0.75$, complemented by mock data built from the MICE2 simulation. We obtain $alpha_{rm{obs}} = 1.93 pm 0.05$ and $alpha_{rm{obs}} = 2.62 pm 0.28$ for the two BOSS samples. For BOSS-like lenses, we forecast a contribution of the magnification bias to the GGL signal between the multipole moments, $ell$, of 100 and 4600 with a cumulative signal-to-noise ratio between 0.1 and 1.1 for sources from the Kilo-Degree Survey (KiDS), between 0.4 and 2.0 for sources from the Hyper Suprime-Cam survey (HSC), and between 0.3 and 2.8 for ESA Euclid-like source samples. These contributions are significant enough to require explicit modelling in future analyses of these and similar surveys. Our code is publicly available within the textsc{MagBEt} module (url{https://github.com/mwiet/MAGBET}).

Cosmology and Nongalactic Astrophysics

Analytic Bias Reduction for $k$-Sample Functionals

279 - Christopher S. Withers , Saralees Nadarajah 2009

We give analytic methods for nonparametric bias reduction that remove the need for computationally intensive methods like the bootstrap and the jackknife. We call an estimate {it $p$th order} if its bias has magnitude $n_0^{-p}$ as $n_0 to infty$, where $n_0$ is the sample size (or the minimum sample size if the estimate is a function of more than one sample). Most estimates are only first order and require O(N) calculations, where $N$ is the total sample size. The usual bootstrap and jackknife estimates are second order but they are computationally intensive, requiring $O(N^2)$ calculations for one sample. By contrast Jaeckels infinitesimal jackknife is an analytic second order one sample estimate requiring only O(N) calculations. When $p$th order bootstrap and jackknife estimates are available, they require $O(N^p)$ calculations, and so become even more computationally intensive if one chooses $p>2$. For general $p$ we provide analytic $p$th order nonparametric estimates that require only O(N) calculations. Our estimates are given in terms of the von Mises derivatives of the functional being estimated, evaluated at the empirical distribution. For products of moments an unbiased estimate exists: our form for this polykay is much simpler than the usual form in terms of power sums.

Methodology