No Arabic abstract
This paper considers the problem of estimating probabilities of the form $mathbb{P}(Y leq w)$, for a given value of $w$, in the situation that a sample of i.i.d. observations $X_1, ldots, X_n$ of $X$ is available, and where we explicitly know a functional relation between the Laplace transforms of the non-negative random variables $X$ and $Y$. A plug-in estimator is constructed by calculating the Laplace transform of the empirical distribution of the sample $X_1, ldots, X_n$, applying the functional relation to it, and then (if possible) inverting the resulting Laplace transform and evaluating it in $w$. We show, under mild regularity conditions, that the resulting estimator is weakly consistent and has expected absolute estimation error $O(n^{-1/2} log(n+1))$. We illustrate our results by two examples: in the first we estimate the distribution of the workload in an M/G/1 queue from observations of the input in fixed time intervals, and in the second we identify the distribution of the increments when observing a compound Poisson process at equidistant points in time (usually referred to as `decompounding).
We consider a pseudo-marginal Metropolis--Hastings kernel $P_m$ that is constructed using an average of $m$ exchangeable random variables, as well as an analogous kernel $P_s$ that averages $s<m$ of these same random variables. Using an embedding technique to facilitate comparisons, we show that the asymptotic variances of ergodic averages associated with $P_m$ are lower bounded in terms of those associated with $P_s$. We show that the bound provided is tight and disprove a conjecture that when the random variables to be averaged are independent, the asymptotic variance under $P_m$ is never less than $s/m$ times the variance under $P_s$. The conjecture does, however, hold when considering continuous-time Markov chains. These results imply that if the computational cost of the algorithm is proportional to $m$, it is often better to set $m=1$. We provide intuition as to why these findings differ so markedly from recent results for pseudo-marginal kernels employing particle filter approximations. Our results are exemplified through two simulation studies; in the first the computational cost is effectively proportional to $m$ and in the second there is a considerable start-up cost at each iteration.
For a skew normal random sequence, convergence rates of the distribution of its partial maximum to the Gumbel extreme value distribution are derived. The asymptotic expansion of the distribution of the normalized maximum is given under an optimal choice of norming constants. We find that the optimal convergence rate of the normalized maximum to the Gumbel extreme value distribution is proportional to $1/log n$.
We study the performance of the Least Squares Estimator (LSE) in a general nonparametric regression model, when the errors are independent of the covariates but may only have a $p$-th moment ($pgeq 1$). In such a heavy-tailed regression setting, we show that if the model satisfies a standard `entropy condition with exponent $alpha in (0,2)$, then the $L_2$ loss of the LSE converges at a rate begin{align*} mathcal{O}_{mathbf{P}}big(n^{-frac{1}{2+alpha}} vee n^{-frac{1}{2}+frac{1}{2p}}big). end{align*} Such a rate cannot be improved under the entropy condition alone. This rate quantifies both some positive and negative aspects of the LSE in a heavy-tailed regression setting. On the positive side, as long as the errors have $pgeq 1+2/alpha$ moments, the $L_2$ loss of the LSE converges at the same rate as if the errors are Gaussian. On the negative side, if $p<1+2/alpha$, there are (many) hard models at any entropy level $alpha$ for which the $L_2$ loss of the LSE converges at a strictly slower rate than other robust estimators. The validity of the above rate relies crucially on the independence of the covariates and the errors. In fact, the $L_2$ loss of the LSE can converge arbitrarily slowly when the independence fails. The key technical ingredient is a new multiplier inequality that gives sharp bounds for the `multiplier empirical process associated with the LSE. We further give an application to the sparse linear regression model with heavy-tailed covariates and errors to demonstrate the scope of this new inequality.
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We demonstrate the algorithms favorable performance on real-world benchmarks and in an extensive simulation study, and we demonstrate its improved interpretability using a large get-out-the-vote experiment. We provide an open-source software package that implements several tree-based estimators with linear aggregation functions.
A robust estimator is proposed for the parameters that characterize the linear regression problem. It is based on the notion of shrinkages, often used in Finance and previously studied for outlier detection in multivariate data. A thorough simulation study is conducted to investigate: the efficiency with normal and heavy-tailed errors, the robustness under contamination, the computational times, the affine equivariance and breakdown value of the regression estimator. Two classical data-sets often used in the literature and a real socio-economic data-set about the Living Environment Deprivation of areas in Liverpool (UK), are studied. The results from the simulations and the real data examples show the advantages of the proposed robust estimator in regression.