No Arabic abstract
In this paper, we study Kaplan-Meier V- and U-statistics respectively defined as $theta(widehat{F}_n)=sum_{i,j}K(X_{[i:n]},X_{[j:n]})W_iW_j$ and $theta_U(widehat{F}_n)=sum_{i eq j}K(X_{[i:n]},X_{[j:n]})W_iW_j/sum_{i eq j}W_iW_j$, where $widehat{F}_n$ is the Kaplan-Meier estimator, ${W_1,ldots,W_n}$ are the Kaplan-Meier weights and $K:(0,infty)^2tomathbb R$ is a symmetric kernel. As in the canonical setting of uncensored data, we differentiate between two asymptotic behaviours for $theta(widehat{F}_n)$ and $theta_U(widehat{F}_n)$. Additionally, we derive an asymptotic canonical V-statistic representation of the Kaplan-Meier V- and U-statistics. By using this representation we study properties of the asymptotic distribution. Applications to hypothesis testing are given.
We study the problem of distributional approximations to high-dimensional non-degenerate $U$-statistics with random kernels of diverging orders. Infinite-order $U$-statistics (IOUS) are a useful tool for constructing simultaneous prediction intervals that quantify the uncertainty of ensemble methods such as subbagging and random forests. A major obstacle in using the IOUS is their computational intractability when the sample size and/or order are large. In this article, we derive non-asymptotic Gaussian approximation error bounds for an incomplete version of the IOUS with a random kernel. We also study data-driven inferential methods for the incomplete IOUS via bootstraps and develop their statistical and computational guarantees.
Two-sample tests utilizing a similarity graph on observations are useful for high-dimensional data and non-Euclidean data due to their flexibility and good performance under a wide range of alternatives. Existing works mainly focused on sparse graphs, such as graphs with the number of edges in the order of the number of observations. However, the tests have better performance with denser graphs under many settings. In this work, we establish the theoretical ground for graph-based tests with graphs that are much denser than those in existing works.
We establish exponential inequalities for a class of V-statistics under strong mixing conditions. Our theory is developed via a novel kernel expansion based on random Fourier features and the use of a probabilistic method. This type of expansion is new and useful for handling many notorious classes of kernels.
Randomization is a basis for the statistical inference of treatment effects without strong assumptions on the outcome-generating process. Appropriately using covariates further yields more precise estimators in randomized experiments. R. A. Fisher suggested blocking on discrete covariates in the design stage or conducting analysis of covariance (ANCOVA) in the analysis stage. We can embed blocking into a wider class of experimental design called rerandomization, and extend the classical ANCOVA to more general regression adjustment. Rerandomization trumps complete randomization in the design stage, and regression adjustment trumps the simple difference-in-means estimator in the analysis stage. It is then intuitive to use both rerandomization and regression adjustment. Under the randomization-inference framework, we establish a unified theory allowing the designer and analyzer to have access to different sets of covariates. We find that asymptotically (a) for any given estimator with or without regression adjustment, rerandomization never hurts either the sampling precision or the estimated precision, and (b) for any given design with or without rerandomization, our regression-adjusted estimator never hurts the estimated precision. Therefore, combining rerandomization and regression adjustment yields better coverage properties and thus improves statistical inference. To theoretically quantify these statements, we discuss optimal regression-adjusted estimators in terms of the sampling precision and the estimated precision, and then measure the additional gains of the designer and the analyzer. We finally suggest using rerandomization in the design and regression adjustment in the analysis followed by the Huber--White robust standard error.
A novel sequential change detection problem is proposed, in which the change should be not only detected but also accelerated. Specifically, it is assumed that the sequentially collected observations are responses to treatments selected in real time. The assigned treatments not only determine the pre-change and post-change distributions of the responses, but also influence when the change happens. The problem is to find a treatment assignment rule and a stopping rule that minimize the expected total number of observations subject to a user-specified bound on the false alarm probability. The optimal solution to this problem is obtained under a general Markovian change-point model. Moreover, an alternative procedure is proposed, whose applicability is not restricted to Markovian change-point models and whose design requires minimal computation. For a large class of change-point models, the proposed procedure is shown to achieve the optimal performance in an asymptotic sense. Finally, its performance is found in two simulation studies to be close to the optimal, uniformly with respect to the error probability.