Do you want to publish a course? Click here

Gibbs posterior concentration rates under sub-exponential type losses

87   0   0.0 ( 0 )
 Added by Nicholas Syring
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Bayesian posterior distributions are widely used for inference, but their dependence on a statistical model creates some challenges. In particular, there may be lots of nuisance parameters that require prior distributions and posterior computations, plus a potentially serious risk of model misspecification bias. Gibbs posterior distributions, on the other hand, offer direct, principled, probabilistic inference on quantities of interest through a loss function, not a model-based likelihood. Here we provide simple sufficient conditions for establishing Gibbs posterior concentration rates when the loss function is of a sub-exponential type. We apply these general results in a range of practically relevant examples, including mean regression, quantile regression, and sparse high-dimensional classification. We also apply these techniques in an important problem in medical statistics, namely, estimation of a personalized minimum clinically important difference.



rate research

Read More

111 - Zhe Wang , Ryan Martin 2021
In mathematical finance, Levy processes are widely used for their ability to model both continuous variation and abrupt, discontinuous jumps. These jumps are practically relevant, so reliable inference on the feature that controls jump frequencies and magnitudes, namely, the Levy density, is of critical importance. A specific obstacle to carrying out model-based (e.g., Bayesian) inference in such problems is that, for general Levy processes, the likelihood is intractable. To overcome this obstacle, here we adopt a Gibbs posterior framework that updates a prior distribution using a suitable loss function instead of a likelihood. We establish asymptotic posterior concentration rates for the proposed Gibbs posterior. In particular, in the most interesting and practically relevant case, we give conditions under which the Gibbs posterior concentrates at (nearly) the minimax optimal rate, adaptive to the unknown smoothness of the true Levy density.
Statistical inference for sparse covariance matrices is crucial to reveal dependence structure of large multivariate data sets, but lacks scalable and theoretically supported Bayesian methods. In this paper, we propose beta-mixture shrinkage prior, computationally more efficient than the spike and slab prior, for sparse covariance matrices and establish its minimax optimality in high-dimensional settings. The proposed prior consists of beta-mixture shrinkage and gamma priors for off-diagonal and diagonal entries, respectively. To ensure positive definiteness of the resulting covariance matrix, we further restrict the support of the prior to a subspace of positive definite matrices. We obtain the posterior convergence rate of the induced posterior under the Frobenius norm and establish a minimax lower bound for sparse covariance matrices. The class of sparse covariance matrices for the minimax lower bound considered in this paper is controlled by the number of nonzero off-diagonal elements and has more intuitive appeal than those appeared in the literature. The obtained posterior convergence rate coincides with the minimax lower bound unless the true covariance matrix is extremely sparse. In the simulation study, we show that the proposed method is computationally more efficient than competitors, while achieving comparable performance. Advantages of the shrinkage prior are demonstrated based on two real data sets.
203 - P. De Blasi , S. Favaro , A. Lijoi 2015
Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Indeed, many popular nonparametric priors, such as the Dirichlet and the Pitman-Yor process priors, select discrete probability distributions almost surely and, therefore, automatically induce exchangeable random partitions. Here we focus on the family of Gibbs-type priors, a recent and elegant generalization of the Dirichlet and the Pitman-Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive characterization in terms of their predictive structure justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman-Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs-type priors and highlight their implications for Bayesian nonparametric inference. We will deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time-dependen
The Bayesian probit regression model (Albert and Chib (1993)) is popular and widely used for binary regression. While the improper flat prior for the regression coefficients is an appropriate choice in the absence of any prior information, a proper normal prior is desirable when prior information is available or in modern high dimensional settings where the number of coefficients ($p$) is greater than the sample size ($n$). For both choices of priors, the resulting posterior density is intractable and a Data Dugmentation (DA) Markov chain is used to generate approximate samples from the posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for constructing standard errors for Markov chain based estimates of posterior quantities. In this paper, we first show that in case of proper normal priors, the DA Markov chain is geometrically ergodic *for all* choices of the design matrix $X$, $n$ and $p$ (unlike the improper prior case, where $n geq p$ and another condition on $X$ are required for posterior propriety itself). We also derive sufficient conditions under which the DA Markov chain is trace-class, i.e., the eigenvalues of the corresponding operator are summable. In particular, this allows us to conclude that the Haar PX-DA sandwich algorithm (obtained by inserting an inexpensive extra step in between the two steps of the DA algorithm) is strictly better than the DA algorithm in an appropriate sense.
In massive data analysis, training and testing data often come from very different sources, and their probability distributions are not necessarily identical. A feature example is nonparametric classification in posterior drift model where the conditional distributions of the label given the covariates are possibly different. In this paper, we derive minimax rate of the excess risk for nonparametric classification in posterior drift model in the setting that both training and testing data have smooth distributions, extending a recent work by Cai and Wei (2019) who only impose smoothness condition on the distribution of testing data. The minimax rate demonstrates a phase transition characterized by the mutual relationship between the smoothness orders of the training and testing data distributions. We also propose a computationally efficient and data-driven nearest neighbor classifier which achieves the minimax excess risk (up to a logarithm factor). Simulation studies and a real-world application are conducted to demonstrate our approach.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا