No Arabic abstract
The validity of conclusions from meta-analysis is potentially threatened by publication bias. Most existing procedures for correcting publication bias assume normality of the study-specific effects that account for between-study heterogeneity. However, this assumption may not be valid, and the performance of these bias correction procedures can be highly sensitive to departures from normality. Further, there exist few measures to quantify the magnitude of publication bias based on selection models. In this paper, we address both of these issues. First, we explore the use of heavy-tailed distributions for the study-specific effects within a Bayesian hierarchical framework. The deviance information criterion (DIC) is used to determine the appropriate distribution to use for conducting the final analysis. Second, we develop a new measure to quantify the magnitude of publication bias based on Hellinger distance. Our measure is easy to interpret and takes advantage of the estimation uncertainty afforded naturally by the posterior distribution. We illustrate our proposed approach through simulation studies and meta-analyses on lung cancer and antidepressants. To assess the prevalence of publication bias, we apply our method to 1500 meta-analyses of dichotomous outcomes in the Cochrane Database of Systematic Reviews. Our methods are implemented in the publicly available R package RobustBayesianCopas.
In meta-analyses, publication bias is a well-known, important and challenging issue because the validity of the results from a meta-analysis is threatened if the sample of studies retrieved for review is biased. One popular method to deal with publication bias is the Copas selection model, which provides a flexible sensitivity analysis for correcting the estimates with considerable insight into the data suppression mechanism. However, rigorous testing procedures under the Copas selection model to detect bias are lacking. To fill this gap, we develop a score-based test for detecting publication bias under the Copas selection model. We reveal that the behavior of the standard score test statistic is irregular because the parameters of the Copas selection model disappear under the null hypothesis, leading to an identifiability problem. We propose a novel test statistic and derive its limiting distribution. A bootstrap procedure is provided to obtain the p-value of the test for practical applications. We conduct extensive Monte Carlo simulations to evaluate the performance of the proposed test and apply the method to several existing meta-analyses.
In this paper we review the concepts of Bayesian evidence and Bayes factors, also known as log odds ratios, and their application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Specific attention is paid to the Laplace approximation, variational Bayes, importance sampling, thermodynamic integration, and nested sampling and its recent variants. Analogies to statistical physics, from which many of these techniques originate, are discussed in order to provide readers with deeper insights that may lead to new techniques. The utility of Bayesian model testing in the domain sciences is demonstrated by presenting four specific practical examples considered within the context of signal processing in the areas of signal detection, sensor characterization, scientific model selection and molecular force characterization.
In the era of big data, variable selection is a key technology for handling high-dimensional problems with a small sample size but a large number of covariables. Different variable selection methods were proposed for different models, such as linear model, logistic model and generalized linear model. However, fewer works focused on variable selection for single index models, especially, for single index logistic model, due to the difficulty arose from the unknown link function and the slow mixing rate of MCMC algorithm for traditional logistic model. In this paper, we proposed a Bayesian variable selection procedure for single index logistic model by taking the advantage of Gaussian process and data augmentation. Numerical results from simulations and real data analysis show the advantage of our method over the state of arts.
Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency and asymptotic normality of the MLEs. Proper score residuals for sample selection models are provided, and model adequacy is addressed. Simulated results are presented to check the finite-sample behavior of the estimators and to verify the consequences of not considering varying sample selection bias and dispersion parameters. We show that the normal assumption for analyzing medical expenditure data is suitable and that the conclusions drawn using our approach are coherent with findings from prior literature. Moreover, we identify which covariates are relevant to explain the presence of sample selection bias in this important dataset.
Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the data selection problem: finding a lower-dimensional statistic - such as a subset of variables - that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining background components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing both data selection and model selection, the Stein volume criterion, that takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback-Leibler divergence. The Stein volume criterion does not require one to fit or even specify a nonparametric background model, making it straightforward to compute - in many cases it is as simple as fitting the parametric model of interest with an alternative objective function. We prove that the Stein volume criterion is consistent for both data selection and model selection, and we establish consistency and asymptotic normality (Bernstein-von Mises) of the corresponding generalized posterior on parameters. We validate our method in simulation and apply it to the analysis of single-cell RNA sequencing datasets using probabilistic principal components analysis and a spin glass model of gene regulation.