In this article, we derive a new generalization of Chebyshev inequality for random vectors. We demonstrate that the new generalization is much less conservative than the classical generalization.
In this paper, we have established a unified framework of multistage parameter estimation. We demonstrate that a wide variety of statistical problems such as fixed-sample-size interval estimation, point estimation with error control, bounded-width confidence intervals, interval estimation following hypothesis testing, construction of confidence sequences, can be cast into the general framework of constructing sequential random intervals with prescribed coverage probabilities. We have developed exact methods for the construction of such sequential random intervals in the context of multistage sampling. In particular, we have established inclusion principle and coverage tuning techniques to control and adjust the coverage probabilities of sequential random intervals. We have obtained concrete sampling schemes which are unprecedentedly efficient in terms of sampling effort as compared to existing procedures.
We extend Fanos inequality, which controls the average probability of events in terms of the average of some $f$--divergences, to work with arbitrary events (not necessarily forming a partition) and even with arbitrary $[0,1]$--valued random variables, possibly in continuously infinite number. We provide two applications of these extensions, in which the consideration of random variables is particularly handy: we offer new and elegant proofs for existing lower bounds, on Bayesian posterior concentration (minimax or distribution-dependent) rates and on the regret in non-stochastic sequential learning.
In this paper, we have established a general framework of multistage hypothesis tests which applies to arbitrarily many mutually exclusive and exhaustive composite hypotheses. Within the new framework, we have constructed specific multistage tests which rigorously control the risk of committing decision errors and are more efficient than previous tests in terms of average sample number and the number of sampling operations. Without truncation, the sample numbers of our testing plans are absolutely bounded.
We derive simple concentration inequalities for bounded random vectors, which generalize Hoeffdings inequalities for bounded scalar random variables. As applications, we apply the general results to multinomial and Dirichlet distributions to obtain multivariate concentration inequalities.
We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic generalisation performance in these problems, valid in both the under- and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks.