No Arabic abstract
The present paper provides a new generic strategy leading to non-asymptotic theoretical guarantees on the Leave-one-Out procedure applied to a broad class of learning algorithms. This strategy relies on two main ingredients: the new notion of $L^q$ stability, and the strong use of moment inequalities. $L^q$ stability extends the ongoing notion of hypothesis stability while remaining weaker than the uniform stability. It leads to new PAC exponential generalisation bounds for Leave-one-Out under mild assumptions. In the literature, such bounds are available only for uniform stable algorithms under boundedness for instance. Our generic strategy is applied to the Ridge regression algorithm as a first step.
Let ${mathcal S}_m$ be the set of all $mtimes m$ density matrices (Hermitian positively semi-definite matrices of unit trace). Consider a problem of estimation of an unknown density matrix $rhoin {mathcal S}_m$ based on outcomes of $n$ measurements of observables $X_1,dots, X_nin {mathbb H}_m$ (${mathbb H}_m$ being the space of $mtimes m$ Hermitian matrices) for a quantum system identically prepared $n$ times in state $rho.$ Outcomes $Y_1,dots, Y_n$ of such measurements could be described by a trace regression model in which ${mathbb E}_{rho}(Y_j|X_j)={rm tr}(rho X_j), j=1,dots, n.$ The design variables $X_1,dots, X_n$ are often sampled at random from the uniform distribution in an orthonormal basis ${E_1,dots, E_{m^2}}$ of ${mathbb H}_m$ (such as Pauli basis). The goal is to estimate the unknown density matrix $rho$ based on the data $(X_1,Y_1), dots, (X_n,Y_n).$ Let $$ hat Z:=frac{m^2}{n}sum_{j=1}^n Y_j X_j $$ and let $check rho$ be the projection of $hat Z$ onto the convex set ${mathcal S}_m$ of density matrices. It is shown that for estimator $check rho$ the minimax lower bounds in classes of low rank density matrices (established earlier) are attained up logarithmic factors for all Schatten $p$-norm distances, $pin [1,infty]$ and for Bures version of quantum Hellinger distance. Moreover, for a slightly modified version of estimator $check rho$ the same property holds also for quantum relative entropy (Kullback-Leibler) distance between density matrices.
The lasso procedure is ubiquitous in the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses---predictive risk consistency, sign consistency, correct model selection---all of it has assumes that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to take values in the interval [0;1]). In order to relax this assumption, we propose a new notion called HYPE (standing for emph{HYPothesis-dependent rangE}), which effectively allows the range of the loss to depend on each predictor. Based on this new notion we derive a novel PAC-Bayesian generalisation bound for unbounded loss functions, and we instantiate it on a linear regression problem. To make our theory usable by the largest audience possible, we include discussions on actual computation, practicality and limitations of our assumptions.
We present a simple algorithm for identifying and correcting real-valued noisy labels from a mixture of clean and corrupted sample points using Gaussian process regression. A heteroscedastic noise model is employed, in which additive Gaussian noise terms with independent variances are associated with each and all of the observed labels. Optimizing the noise model using maximum likelihood estimation leads to the containment of the GPR models predictive error by the posterior standard deviation in leave-one-out cross-validation. A multiplicative update scheme is proposed for solving the maximum likelihood estimation problem under non-negative constraints. While we provide proof of convergence for certain special cases, the multiplicative scheme has empirically demonstrated monotonic convergence behavior in virtually all our numerical experiments. We show that the presented method can pinpoint corrupted sample points and lead to better regression models when trained on synthetic and real-world scientific data sets.
Possible reasons for the uniqueness of the positive geometric law in the context of stability of random extremes are explored here culminating in a conjecture characterizing the geometric law. Our reasoning comes closer in justifying the geometric law in similar contexts discussed in Arnold et al. (1986) and Marshall & Olkin (1997) and also supplement their arguments.