Do you want to publish a course? Click here

Bootstrap for U-Statistics: A new approach

146   0   0.0 ( 0 )
 Added by Martin Wendler
 Publication date 2015
and research's language is English




Ask ChatGPT about the research

Bootstrap for nonlinear statistics like U-statistics of dependent data has been studied by several authors. This is typically done by producing a bootstrap version of the sample and plugging it into the statistic. We suggest an alternative approach of getting a bootstrap version of U-statistics, which can be described as a compromise between bootstrap and subsampling. We will show the consistency of the new method and compare its finite sample properties in a simulation study.



rate research

Read More

In this paper we analyze different ways of performing principal component analysis throughout three different approaches: robust covariance and correlation matrix estimation, projection pursuit approach and non-parametric maximum entropy algorithm. The objective of these approaches is the correction of the well known sensitivity to outliers of the classical method for principal component analysis. Due to their robustness, they perform very well in contaminated data, while the classical approach fails to preserve the characteristics of the core information.
This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $Pr(n^{-1/2}sum_{i=1}^n X_iin A)$ where $X_1,dots,X_n$ are independent random vectors in $mathbb{R}^p$ and $A$ is a hyperrectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_nto infty$ as $n to infty$ and $p gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.
242 - Karl Mosler 2012
In 1975 John Tukey proposed a multivariate median which is the deepest point in a given data cloud in R^d. Later, in measuring the depth of an arbitrary point z with respect to the data, David Donoho and Miriam Gasko considered hyperplanes through z and determined its depth by the smallest portion of data that are separated by such a hyperplane. Since then, these ideas has proved extremely fruitful. A rich statistical methodology has developed that is based on data depth and, more general, nonparametric depth statistics. General notions of data depth have been introduced as well as many special ones. These notions vary regarding their computability and robustness and their sensitivity to reflect asymmetric shapes of the data. According to their different properties they fit to particular applications. The upper level sets of a depth statistic provide a family of set-valued statistics, named depth-trimmed or central regions. They describe the distribution regarding its location, scale and shape. The most central region serves as a median. The notion of depth has been extended from data clouds, that is empirical distributions, to general probability distributions on R^d, thus allowing for laws of large numbers and consistency results. It has also been extended from d-variate data to data in functional spaces.
194 - Salim Bouzebda 2009
The purpose of this note is to provide an approximation for the generalized bootstrapped empirical process achieving the rate in Kolmos et al. (1975). The proof is based on much the same arguments as in Horvath et al. (2000). As a consequence, we establish an approximation of the bootstrapped kernel-type density estimator
We introduce a new sufficient statistic for the population parameter vector by allowing for the sampling design to first be selected at random amongst a set of candidate sampling designs. In contrast to the traditional approach in survey sampling, we achieve this by defining the observed data to include a mention of the sampling design used for the data collection aspect of the study. We show that the reduced data consisting of the unit labels together with their corresponding responses of interest is a sufficient statistic under this setup. A Rao-Blackwellization inference procedure is outlined and it is shown how averaging over hypothetical observed data outcomes results in improved estimators; the improved strategy includes considering all possible sampling designs in the candidate set that could have given rise to the reduced data. Expressions for the variance of the Rao-Blackwell estimators are also derived. The results from two simulation studies are presented to demonstrate the practicality of our approach. A discussion on how our approach can be useful when the analyst has limited information on the data collection procedure is also provided.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا