We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to $0.5$, and a nearly-minimax-rate-breakdown point approximately equal to $0.28$. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.
We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distribution of matching variance. Crucially, in contrast to prior works, our estimator does not require prior knowledge of the variance, and works across the entire gamut of distributions with bounded variance, including those without any higher moments. Parameterized by the sample size $n$, the failure probability $delta$, and the variance $sigma^2$, our estimator is accurate to within $sigmacdot(1+o(1))sqrt{frac{2logfrac{1}{delta}}{n}}$, tight up to the $1+o(1)$ factor. Our estimator construction and analysis gives a framework generalizable to other problems, tightly analyzing a sum of dependent random variables by viewing the sum implicitly as a 2-parameter $psi$-estimator, and constructing bounds using mathematical programming and duality techniques.
We consider a sparse linear regression model with unknown symmetric error under the high-dimensional setting. The true error distribution is assumed to belong to the locally $beta$-H{o}lder class with an exponentially decreasing tail, which does not need to be sub-Gaussian. We obtain posterior convergence rates of the regression coefficient and the error density, which are nearly optimal and adaptive to the unknown sparsity level. Furthermore, we derive the semi-parametric Bernstein-von Mises (BvM) theorem to characterize asymptotic shape of the marginal posterior for regression coefficients. Under the sub-Gaussianity assumption on the true score function, strong model selection consistency for regression coefficients are also obtained, which eventually asserts the frequentists validity of credible sets.
This paper deals with the maximum likelihood estimator for the mean-reverting parameter of a first order autoregressive models with exogenous variables, which are stationary Gaussian noises (Colored noise). Using the method of the Laplace transform, both the asymptotic properties and the asymptotic design problem of the maximum likelihood estimator are investigated. The numerical simulation results confirm the theoretical analysis and show that the proposed maximum likelihood estimator performs well in finite sample.
Neural networks are one of the most popularly used methods in machine learning and artificial intelligence nowadays. Due to the universal approximation theorem (Hornik et al. (1989)), a neural network with one hidden layer can approximate any continuous function on a compact support as long as the number of hidden units is sufficiently large. Statistically, a neural network can be classified into a nonlinear regression framework. However, if we consider it parametrically, due to the unidentifiability of the parameters, it is difficult to derive its asymptotic properties. Instead, we considered the estimation problem in a nonparametric regression framework and use the results from sieve estimation to establish the consistency, the rates of convergence and the asymptotic normality of the neural network estimators. We also illustrate the validity of the theories via simulations.