No Arabic abstract
In this paper, we use the class of Wasserstein metrics to study asymptotic properties of posterior distributions. Our first goal is to provide sufficient conditions for posterior consistency. In addition to the well-known Schwartzs Kullback--Leibler condition on the prior, the true distribution and most probability measures in the support of the prior are required to possess moments up to an order which is determined by the order of the Wasserstein metric. We further investigate convergence rates of the posterior distributions for which we need stronger moment conditions. The required tail conditions are sharp in the sense that the posterior distribution may be inconsistent or contract slowly to the true distribution without these conditions. Our study involves techniques that build on recent advances on Wasserstein convergence of empirical measures. We apply the results to density estimation with a Dirichlet process mixture prior and conduct a simulation study for further illustration.
The purpose of this paper is to estimate the intensity of a Poisson process $N$ by using thresholding rules. In this paper, the intensity, defined as the derivative of the mean measure of $N$ with respect to $ndx$ where $n$ is a fixed parameter, is assumed to be non-compactly supported. The estimator $tilde{f}_{n,gamma}$ based on random thresholds is proved to achieve the same performance as the oracle estimator up to a possible logarithmic term. Then, minimax properties of $tilde{f}_{n,gamma}$ on Besov spaces ${cal B}^{ensuremath alpha}_{p,q}$ are established. Under mild assumptions, we prove that $$sup_{fin B^{ensuremath alpha}_{p,q}cap ensuremath mathbb {L}_{infty}} ensuremath mathbb {E}(ensuremath | | tilde{f}_{n,gamma}-f| |_2^2)leq C(frac{log n}{n})^{frac{ensuremath alpha}{ensuremath alpha+{1/2}+({1/2}-frac{1}{p})_+}}$$ and the lower bound of the minimax risk for ${cal B}^{ensuremath alpha}_{p,q}cap ensuremath mathbb {L}_{infty}$ coincides with the previous upper bound up to the logarithmic term. This new result has two consequences. First, it establishes that the minimax rate of Besov spaces ${cal B}^{ensuremath alpha}_{p,q}$ with $pleq 2$ when non compactly supported functions are considered is the same as for compactly supported functions up to a logarithmic term. When $p>2$, the rate exponent, which depends on $p$, deteriorates when $p$ increases, which means that the support plays a harmful role in this case. Furthermore, $tilde{f}_{n,gamma}$ is adaptive minimax up to a logarithmic term.
This paper introduces a new approach to the study of rates of convergence for posterior distributions. It is a natural extension of a recent approach to the study of Bayesian consistency. In particular, we improve on current rates of convergence for models including the mixture of Dirichlet process model and the random Bernstein polynomial model.
We show that diffusion processes can be exploited to study the posterior contraction rates of parameters in Bayesian models. By treating the posterior distribution as a stationary distribution of a stochastic differential equation (SDE), posterior convergence rates can be established via control of the moments of the corresponding SDE. Our results depend on the structure of the population log-likelihood function, obtained in the limit of an infinite sample sample size, and stochastic perturbation bounds between the population and sample log-likelihood functions. When the population log-likelihood is strongly concave, we establish posterior convergence of a $d$-dimensional parameter at the optimal rate $(d/n)^{1/ 2}$. In the weakly concave setting, we show that the convergence rate is determined by the unique solution of a non-linear equation that arises from the interplay between the degree of weak concavity and the stochastic perturbation bounds. We illustrate this general theory by deriving posterior convergence rates for three concrete examples: Bayesian logistic regression models, Bayesian single index models, and over-specified Bayesian mixture models.
We investigate predictive density estimation under the $L^2$ Wasserstein loss for location families and location-scale families. We show that plug-in densities form a complete class and that the Bayesian predictive density is given by the plug-in density with the posterior mean of the location and scale parameters. We provide Bayesian predictive densities that dominate the best equivariant one in normal models.
Wasserstein geometry and information geometry are two important structures to be introduced in a manifold of probability distributions. Wasserstein geometry is defined by using the transportation cost between two distributions, so it reflects the metric of the base manifold on which the distributions are defined. Information geometry is defined to be invariant under reversible transformations of the base space. Both have their own merits for applications. In particular, statistical inference is based upon information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful in computer vision and AI applications. In this study, we analyze statistical inference based on the Wasserstein geometry in the case that the base space is one-dimensional. By using the location-scale model, we further derive the W-estimator that explicitly minimizes the transportation cost from the empirical distribution to a statistical model and study its asymptotic behaviors. We show that the W-estimator is consistent and explicitly give its asymptotic distribution by using the functional delta method. The W-estimator is Fisher efficient in the Gaussian case.