No Arabic abstract
This paper deals with the estimation of a probability measure on the real line from data observed with an additive noise. We are interested in rates of convergence for the Wasserstein metric of order $pgeq 1$. The distribution of the errors is assumed to be known and to belong to a class of supersmooth or ordinary smooth distributions. We obtain in the univariate situation an improved upper bound in the ordinary smooth case and less restrictive conditions for the existing bound in the supersmooth one. In the ordinary smooth case, a lower bound is also provided, and numerical experiments illustrating the rates of convergence are presented.
Wasserstein geometry and information geometry are two important structures to be introduced in a manifold of probability distributions. Wasserstein geometry is defined by using the transportation cost between two distributions, so it reflects the metric of the base manifold on which the distributions are defined. Information geometry is defined to be invariant under reversible transformations of the base space. Both have their own merits for applications. In particular, statistical inference is based upon information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful in computer vision and AI applications. In this study, we analyze statistical inference based on the Wasserstein geometry in the case that the base space is one-dimensional. By using the location-scale model, we further derive the W-estimator that explicitly minimizes the transportation cost from the empirical distribution to a statistical model and study its asymptotic behaviors. We show that the W-estimator is consistent and explicitly give its asymptotic distribution by using the functional delta method. The W-estimator is Fisher efficient in the Gaussian case.
We consider the nonparametric estimation of the density function of weakly and strongly dependent processes with noisy observations. We show that in the ordinary smooth case the optimal bandwidth choice can be influenced by long range dependence, as opposite to the standard case, when no noise is present. In particular, if the dependence is moderate the bandwidth, the rates of mean-square convergence and, additionally, central limit theorem are the same as in the i.i.d. case. If the dependence is strong enough, then the bandwidth choice is influenced by the strength of dependence, which is different when compared to the non-noisy case. Also, central limit theorem are influenced by the strength of dependence. On the other hand, if the density is supersmooth, then long range dependence has no effect at all on the optimal bandwidth choice.
Kernel ridge regression is an important nonparametric method for estimating smooth functions. We introduce a new set of conditions, under which the actual rates of convergence of the kernel ridge regression estimator under both the L_2 norm and the norm of the reproducing kernel Hilbert space exceed the standard minimax rates. An application of this theory leads to a new understanding of the Kennedy-OHagan approach for calibrating model parameters of computer simulation. We prove that, under certain conditions, the Kennedy-OHagan calibration estimator with a known covariance function converges to the minimizer of the norm of the residual function in the reproducing kernel Hilbert space.
We study high-dimensional linear models with error-in-variables. Such models are motivated by various applications in econometrics, finance and genetics. These models are challenging because of the need to account for measurement errors to avoid non-vanishing biases in addition to handle the high dimensionality of the parameters. A recent growing literature has proposed various estimators that achieve good rates of convergence. Our main contribution complements this literature with the construction of simultaneous confidence regions for the parameters of interest in such high-dimensional linear models with error-in-variables. These confidence regions are based on the construction of moment conditions that have an additional orthogonal property with respect to nuisance parameters. We provide a construction that requires us to estimate an additional high-dimensional linear model with error-in-variables for each component of interest. We use a multiplier bootstrap to compute critical values for simultaneous confidence intervals for a subset $S$ of the components. We show its validity despite of possible model selection mistakes, and allowing for the cardinality of $S$ to be larger than the sample size. We apply and discuss the implications of our results to two examples and conduct Monte Carlo simulations to illustrate the performance of the proposed procedure.
The paper discusses the estimation of a continuous density function of the target random field $X_{bf{i}}$, $bf{i}in mathbb {Z}^N$ which is contaminated by measurement errors. In particular, the observed random field $Y_{bf{i}}$, $bf{i}in mathbb {Z}^N$ is such that $Y_{bf{i}}=X_{bf{i}}+epsilon_{bf{i}}$, where the random error $epsilon_{bf{i}}$ is from a known distribution and independent of the target random field. Compared to the existing results, the paper is improved in two directions. First, the random vectors in contrast to univariate random variables are investigated. Second, a random field with a certain spatial interactions instead of i. i. d. random variables is studied. Asymptotic normality of the proposed estimator is established under appropriate conditions.