No Arabic abstract
Bayesian methods are actively used for parameter identification and uncertainty quantification when solving nonlinear inverse problems with random noise. However, there are only few theoretical results justifying the Bayesian approach. Recent papers, see e.g. cite{Nickl2017,lu2017bernsteinvon} and references therein, illustrate the main difficulties and challenges in studying the properties of the posterior distribution in the nonparametric setup. This paper offers a new approach for study the frequentist properties of the nonparametric Bayes procedures. The idea of the approach is to relax the nonlinear structural equation by introducing an auxiliary functional parameter and replacing the structural equation with a penalty and by imposing a prior on the auxiliary parameter. For the such extended model, we state sharp bounds on posterior concentration and on the accuracy of the penalized MLE and on Gaussian approximation of the posterior, and a number of further results. All the bounds are given in terms of effective dimension, and we show that the proposed calming device does not significantly affect this value.
We prove a Bernstein-von Mises theorem for a general class of high dimensional nonlinear Bayesian inverse problems in the vanishing noise limit. We propose a sufficient condition on the growth rate of the number of unknown parameters under which the posterior distribution is asymptotically normal. This growth condition is expressed explicitly in terms of the model dimension, the degree of ill-posedness of the inverse problem and the noise parameter. The theoretical results are applied to a Bayesian estimation of the medium parameter in an elliptic problem.
In this work, we focus on variational Bayesian inference on the sparse Deep Neural Network (DNN) modeled under a class of spike-and-slab priors. Given a pre-specified sparse DNN structure, the corresponding variational posterior contraction rate is characterized that reveals a trade-off between the variational error and the approximation error, which are both determined by the network structural complexity (i.e., depth, width and sparsity). However, the optimal network structure, which strikes the balance of the aforementioned trade-off and yields the best rate, is generally unknown in reality. Therefore, our work further develops an {em adaptive} variational inference procedure that can automatically select a reasonably good (data-dependent) network structure that achieves the best contraction rate, without knowing the optimal network structure. In particular, when the true function is H{o}lder smooth, the adaptive variational inference is capable to attain (near-)optimal rate without the knowledge of smoothness level. The above rate still suffers from the curse of dimensionality, and thus motivates the teacher-student setup, i.e., the true function is a sparse DNN model, under which the rate only logarithmically depends on the input dimension.
We consider a sparse linear regression model with unknown symmetric error under the high-dimensional setting. The true error distribution is assumed to belong to the locally $beta$-H{o}lder class with an exponentially decreasing tail, which does not need to be sub-Gaussian. We obtain posterior convergence rates of the regression coefficient and the error density, which are nearly optimal and adaptive to the unknown sparsity level. Furthermore, we derive the semi-parametric Bernstein-von Mises (BvM) theorem to characterize asymptotic shape of the marginal posterior for regression coefficients. Under the sub-Gaussianity assumption on the true score function, strong model selection consistency for regression coefficients are also obtained, which eventually asserts the frequentists validity of credible sets.
Results by van der Vaart (1991) from semi-parametric statistics about the existence of a non-zero Fisher information are reviewed in an infinite-dimensional non-linear Gaussian regression setting. Information-theoretically optimal inference on aspects of the unknown parameter is possible if and only if the adjoint of the linearisation of the regression map satisfies a certain range condition. It is shown that this range condition may fail in a commonly studied elliptic inverse problem with a divergence form equation, and that a large class of smooth linear functionals of the conductivity parameter cannot be estimated efficiently in this case. In particular, Gaussian `Bernstein von Mises-type approximations for Bayesian posterior distributions do not hold in this setting.
In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the large $p$ small $n$ setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.