No Arabic abstract
We study the Bayesian inverse problem of learning a linear operator on a Hilbert space from its noisy pointwise evaluations on random input data. Our framework assumes that this target operator is self-adjoint and diagonal in a basis shared with the Gaussian prior and noise covariance operators arising from the imposed statistical model and is able to handle target operators that are compact, bounded, or even unbounded. We establish posterior contraction rates with respect to a family of Bochner norms as the number of data tend to infinity and derive related lower bounds on the estimation error. In the large data limit, we also provide asymptotic convergence rates of suitably defined excess risk and generalization gap functionals associated with the posterior mean point estimator. In doing so, we connect the posterior consistency results to nonparametric learning theory. Furthermore, these convergence rates highlight and quantify the difficulty of learning unbounded linear operators in comparison with the learning of bounded or compact ones. Numerical experiments confirm the theory and demonstrate that similar conclusions may be expected in more general problem settings.
In functional linear regression, the slope ``parameter is a function. Therefore, in a nonparametric context, it is determined by an infinite number of unknowns. Its estimation involves solving an ill-posed problem and has points of contact with a range of methodologies, including statistical smoothing and deconvolution. The standard approach to estimating the slope function is based explicitly on functional principal components analysis and, consequently, on spectral decomposition in terms of eigenvalues and eigenfunctions. We discuss this approach in detail and show that in certain circumstances, optimal convergence rates are achieved by the PCA technique. An alternative approach based on quadratic regularisation is suggested and shown to have advantages from some points of view.
We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks. We establish the convergence rates of the maximum likelihood estimation (MLE) for these models. Our proof technique is based on a novel notion of emph{algebraic independence} of the expert functions. Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs). Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimation.
The emergence of big data has led to a growing interest in so-called convergence complexity analysis, which is the study of how the convergence rate of a Monte Carlo Markov chain (for an intractable Bayesian posterior distribution) scales as the underlying data set grows in size. Convergence complexity analysis of practical Monte Carlo Markov chains on continuous state spaces is quite challenging, and there have been very few successful analyses of such chains. One fruitful analysis was recently presented by Qin and Hobert (2021b), who studied a Gibbs sampler for a simple Bayesian random effects model. These authors showed that, under regularity conditions, the geometric convergence rate of this Gibbs sampler converges to zero as the data set grows in size. It is shown herein that similar behavior is exhibited by Gibbs samplers for more general Bayesian models that possess both random effects and traditional continuous covariates, the so-called mixed models. The analysis employs the Wasserstein-based techniques introduced by Qin and Hobert (2021b).
Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al. (2011) as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rates of convergence of the DTEM directly depends on the regularity at zero of a particular quantile fonction which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight.
We consider nonparametric estimation of the mean and covariance functions for functional/longitudinal data. Strong uniform convergence rates are developed for estimators that are local-linear smoothers. Our results are obtained in a unified framework in which the number of observations within each curve/cluster can be of any rate relative to the sample size. We show that the convergence rates for the procedures depend on both the number of sample curves and the number of observations on each curve. For sparse functional data, these rates are equivalent to the optimal rates in nonparametric regression. For dense functional data, root-n rates of convergence can be achieved with proper choices of bandwidths. We further derive almost sure rates of convergence for principal component analysis using the estimated covariance function. The results are illustrated with simulation studies.