Do you want to publish a course? Click here

Approximating high-dimensional infinite-order $U$-statistics: statistical and computational guarantees

94   0   0.0 ( 0 )
 Added by Yanglei Song
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We study the problem of distributional approximations to high-dimensional non-degenerate $U$-statistics with random kernels of diverging orders. Infinite-order $U$-statistics (IOUS) are a useful tool for constructing simultaneous prediction intervals that quantify the uncertainty of ensemble methods such as subbagging and random forests. A major obstacle in using the IOUS is their computational intractability when the sample size and/or order are large. In this article, we derive non-asymptotic Gaussian approximation error bounds for an incomplete version of the IOUS with a random kernel. We also study data-driven inferential methods for the incomplete IOUS via bootstraps and develop their statistical and computational guarantees.

rate research

Read More

In this paper, we study Kaplan-Meier V- and U-statistics respectively defined as $theta(widehat{F}_n)=sum_{i,j}K(X_{[i:n]},X_{[j:n]})W_iW_j$ and $theta_U(widehat{F}_n)=sum_{i eq j}K(X_{[i:n]},X_{[j:n]})W_iW_j/sum_{i eq j}W_iW_j$, where $widehat{F}_n$ is the Kaplan-Meier estimator, ${W_1,ldots,W_n}$ are the Kaplan-Meier weights and $K:(0,infty)^2tomathbb R$ is a symmetric kernel. As in the canonical setting of uncensored data, we differentiate between two asymptotic behaviours for $theta(widehat{F}_n)$ and $theta_U(widehat{F}_n)$. Additionally, we derive an asymptotic canonical V-statistic representation of the Kaplan-Meier V- and U-statistics. By using this representation we study properties of the asymptotic distribution. Applications to hypothesis testing are given.
We generalize standard credal set models for imprecise probabilities to include higher order credal sets -- confidences about confidences. In doing so, we specify how an agents higher order confidences (credal sets) update upon observing an event. Our model begins to address standard issues with imprecise probability models, like Dilation and Belief Inertia. We conjecture that when higher order credal sets contain all possible probability functions, then in the limiting case the highest order confidences converge to form a uniform distribution over the first order credal set, where we define uniformity in terms of the statistical distance metric (total variation distance). Finite simulation supports the conjecture. We further suggest that this convergence presents the total-variation-uniform distribution as a natural, privileged prior for statistical hypothesis testing.
We consider the problem of constructing nonparametric undirected graphical models for high-dimensional functional data. Most existing statistical methods in this context assume either a Gaussian distribution on the vertices or linear conditional means. In this article we provide a more flexible model which relaxes the linearity assumption by replacing it by an arbitrary additive form. The use of functional principal components offers an estimation strategy that uses a group lasso penalty to estimate the relevant edges of the graph. We establish statistical guarantees for the resulting estimators, which can be used to prove consistency if the dimension and the number of functional principal components diverge to infinity with the sample size. We also investigate the empirical performance of our method through simulation studies and a real data application.
In the setting of high-dimensional linear models with Gaussian noise, we investigate the possibility of confidence statements connected to model selection. Although there exist numerous procedures for adaptive point estimation, the construction of adaptive confidence regions is severely limited (cf. Li, 1989). The present paper sheds new light on this gap. We develop exact and adaptive confidence sets for the best approximating model in terms of risk. One of our constructions is based on a multiscale procedure and a particular coupling argument. Utilizing exponential inequalities for noncentral chi-squared distributions, we show that the risk and quadratic loss of all models within our confidence region are uniformly bounded by the minimal risk times a factor close to one.
We consider high-dimensional measurement errors with high-frequency data. Our focus is on recovering the covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding. By developing a new technical device integrating the high-frequency data feature with the conventional notion of $alpha$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions. We then establish cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا