Dimension-agnostic inference


الملخص بالإنكليزية

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d_n$ and $n$ both increase to infinity together at some prescribed relative rate. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n gg d$, or $d_n/n approx 0.2$? This paper considers the goal of dimension-agnostic inference -- developing methods whose validity does not depend on any assumption on $d_n$. We introduce a new, generic approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonals. We exemplify our technique for a handful of classical problems including one-sample mean and covariance testing. Our tests are shown to have minimax rate-optimal power against appropriate local alternatives, and without explicitly targeting the high-dimensional setting their power is optimal up to a $sqrt 2$ factor. A hidden advantage is that our proofs are simple and transparent. We end by describing several fruitful open directions.

تحميل البحث