Do you want to publish a course? Click here

Improved Convergence Rates for the Orthogonal Greedy Algorithm

143   0   0.0 ( 0 )
 Added by Jonathan Siegel
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

We analyze the orthogonal greedy algorithm when applied to dictionaries $mathbb{D}$ whose convex hull has small entropy. We show that if the metric entropy of the convex hull of $mathbb{D}$ decays at a rate of $O(n^{-frac{1}{2}-alpha})$ for $alpha > 0$, then the orthogonal greedy algorithm converges at the same rate. This improves upon the well-known $O(n^{-frac{1}{2}})$ convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. Finally, we show that these improved rates are sharp under the given entropy decay assumptions.



rate research

Read More

153 - L. Gyorfi , G. Lugosi , G. Morvai 2008
We present a simple randomized procedure for the prediction of a binary sequence. The algorithm uses ideas from recent developments of the theory of the prediction of individual sequences. We show that if the sequence is a realization of a stationary and ergodic random process then the average number of mistakes converges, almost surely, to that of the optimum, given by the Bayes predictor. The desirable finite-sample properties of the predictor are illustrated by its performance for Markov processes. In such cases the predictor exhibits near optimal behavior even without knowing the order of the Markov process. Prediction with side information is also considered.
We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks. We establish the convergence rates of the maximum likelihood estimation (MLE) for these models. Our proof technique is based on a novel notion of emph{algebraic independence} of the expert functions. Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs). Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimation.
Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al. (2011) as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rates of convergence of the DTEM directly depends on the regularity at zero of a particular quantile fonction which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight.
126 - Rui Tuo , Yan Wang , C. F. Jeff Wu 2020
Kernel ridge regression is an important nonparametric method for estimating smooth functions. We introduce a new set of conditions, under which the actual rates of convergence of the kernel ridge regression estimator under both the L_2 norm and the norm of the reproducing kernel Hilbert space exceed the standard minimax rates. An application of this theory leads to a new understanding of the Kennedy-OHagan approach for calibrating model parameters of computer simulation. We prove that, under certain conditions, the Kennedy-OHagan calibration estimator with a known covariance function converges to the minimizer of the norm of the residual function in the reproducing kernel Hilbert space.
In functional linear regression, the slope ``parameter is a function. Therefore, in a nonparametric context, it is determined by an infinite number of unknowns. Its estimation involves solving an ill-posed problem and has points of contact with a range of methodologies, including statistical smoothing and deconvolution. The standard approach to estimating the slope function is based explicitly on functional principal components analysis and, consequently, on spectral decomposition in terms of eigenvalues and eigenfunctions. We discuss this approach in detail and show that in certain circumstances, optimal convergence rates are achieved by the PCA technique. An alternative approach based on quadratic regularisation is suggested and shown to have advantages from some points of view.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا