No Arabic abstract
We study high-dimensional Bayesian linear regression with a general beta prime distribution for the scale parameter. Under the assumption of sparsity, we show that appropriate selection of the hyperparameters in the beta prime prior leads to the (near) minimax posterior contraction rate when $p gg n$. For finite samples, we propose a data-adaptive method for estimating the hyperparameters based on marginal maximum likelihood (MML). This enables our prior to adapt to both sparse and dense settings, and under our proposed empirical Bayes procedure, the MML estimates are never at risk of collapsing to zero. We derive efficient Monte Carlo EM and variational EM algorithms for implementing our model, which are available in the R package NormalBetaPrime. Simulations and analysis of a gene expression data set illustrate our models self-adaptivity to varying levels of sparsity and signal strengths.
We revisit the problem of simultaneously testing the means of $n$ independent normal observations under sparsity. We take a Bayesian approach to this problem by introducing a scale-mixture prior known as the normal-beta prime (NBP) prior. We first derive new concentration properties when the beta prime density is employed for a scale parameter in Bayesian hierarchical models. To detect signals in our data, we then propose a hypothesis test based on thresholding the posterior shrinkage weight under the NBP prior. Taking the loss function to be the expected number of misclassified tests, we show that our test procedure asymptotically attains the optimal Bayes risk when the signal proportion $p$ is known. When $p$ is unknown, we introduce an empirical Bayes variant of our test which also asymptotically attains the Bayes Oracle risk in the entire range of sparsity parameters $p propto n^{-epsilon}, epsilon in (0, 1)$. Finally, we also consider restricted marginal maximum likelihood (REML) and hierarchical Bayes approaches for estimating a key hyperparameter in the NBP prior and examine multiple testing under these frameworks.
We develop a fully Bayesian framework for function-on-scalars regression with many predictors. The functional data response is modeled nonparametrically using unknown basis functions, which produces a flexible and data-adaptive functional basis. We incorporate shrinkage priors that effectively remove unimportant scalar covariates from the model and reduce sensitivity to the number of (unknown) basis functions. For variable selection in functional regression, we propose a decision theoretic posterior summarization technique, which identifies a subset of covariates that retains nearly the predictive accuracy of the full model. Our approach is broadly applicable for Bayesian functional regression models, and unlike existing methods provides joint rather than marginal selection of important predictor variables. Computationally scalable posterior inference is achieved using a Gibbs sampler with linear time complexity in the number of predictors. The resulting algorithm is empirically faster than existing frequentist and Bayesian techniques, and provides joint estimation of model parameters, prediction and imputation of functional trajectories, and uncertainty quantification via the posterior distribution. A simulation study demonstrates improvements in estimation accuracy, uncertainty quantification, and variable selection relative to existing alternatives. The methodology is applied to actigraphy data to investigate the association between intraday physical activity and responses to a sleep questionnaire.
In logistic regression, separation occurs when a linear combination of the predictors can perfectly classify part or all of the observations in the sample, and as a result, finite maximum likelihood estimates of the regression coefficients do not exist. Gelman et al. (2008) recommended independent Cauchy distributions as default priors for the regression coefficients in logistic regression, even in the case of separation, and reported posterior modes in their analyses. As the mean does not exist for the Cauchy prior, a natural question is whether the posterior means of the regression coefficients exist under separation. We prove theorems that provide necessary and sufficient conditions for the existence of posterior means under independent Cauchy priors for the logit link and a general family of link functions, including the probit link. We also study the existence of posterior means under multivariate Cauchy priors. For full Bayesian inference, we develop a Gibbs sampler based on Polya-Gamma data augmentation to sample from the posterior distribution under independent Student-t priors including Cauchy priors, and provide a companion R package in the supplement. We demonstrate empirically that even when the posterior means of the regression coefficients exist under separation, the magnitude of the posterior samples for Cauchy priors may be unusually large, and the corresponding Gibbs sampler shows extremely slow mixing. While alternative algorithms such as the No-U-Turn Sampler in Stan can greatly improve mixing, in order to resolve the issue of extremely heavy tailed posteriors for Cauchy priors under separation, one would need to consider lighter tailed priors such as normal priors or Student-t priors with degrees of freedom larger than one.
In the context of a high-dimensional linear regression model, we propose the use of an empirical correlation-adaptive prior that makes use of information in the observed predictor variable matrix to adaptively address high collinearity, determining if parameters associated with correlated predictors should be shrunk together or kept apart. Under suitable conditions, we prove that this empirical Bayes posterior concentrates around the true sparse parameter at the optimal rate asymptotically. A simplified version of a shotgun stochastic search algorithm is employed to implement the variable selection procedure, and we show, via simulation experiments across different settings and a real-data application, the favorable performance of the proposed method compared to existing methods.
This paper addresses the problem of localizing change points in high-dimensional linear regression models with piecewise constant regression coefficients. We develop a dynamic programming approach to estimate the locations of the change points whose performance improves upon the current state-of-the-art, even as the dimensionality, the sparsity of the regression coefficients, the temporal spacing between two consecutive change points, and the magnitude of the difference of two consecutive regression coefficient vectors are allowed to vary with the sample size. Furthermore, we devise a computationally-efficient refinement procedure that provably reduces the localization error of preliminary estimates of the change points. We demonstrate minimax lower bounds on the localization error that nearly match the upper bound on the localization error of our methodology and show that the signal-to-noise condition we impose is essentially the weakest possible based on information-theoretic arguments. Extensive numerical results support our theoretical findings, and experiments on real air quality data reveal change points supported by historical information not used by the algorithm.