On the rate of convergence of a neural network regression estimate learned by gradient descent

63 0 0.0 ( 0 )

Download Cite

Added by Michael Kohler

Publication date 2019

fields Mathematical Statistics

and research's language is English

Authors Alina Braun - Michael Kohler - Harro Walk

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_2$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient descent procedure is repeated several times with randomly chosen starting values for the weights, and from the list of constructed estimates the one with the minimal empirical $L_2$ risk is chosen. Under the assumption that the number of randomly chosen starting values and the number of steps for gradient descent are sufficiently large it is shown that the resulting estimate achieves (up to a logarithmic factor) the optimal rate of convergence in a projection pursuit model. The final sample size performance of the estimates is illustrated by using simulated data.

rate research

Analysis of the rate of convergence of neural network regression estimates which are easy to implement

60 - Alina Braun , Michael Kohler , Adam Krzyzak 2019

Recent results in nonparametric regression show that for deep learning, i.e., for neural network estimates with many hidden layers, we are able to achieve good rates of convergence even in case of high-dimensional predictor variables, provided suitable assumptions on the structure of the regression function are imposed. The estimates are defined by minimizing the empirical $L_2$ risk over a class of neural networks. In practice it is not clear how this can be done exactly. In this article we introduce a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement. We show that for this estimate we can derive rates of convergence results in case the regression function is smooth. We combine this estimate with the projection pursuit, where we choose the directions randomly, and we show that for sufficiently many repititions we get a neural network regression estimate which is easy to implement and which achieves the one-dimensional rate of convergence (up to some logarithmic factor) in case that the regression function satisfies the assumptions of projection pursuit.

Statistics Theory Statistics Theory

Convergence of Gaussian process regression: Optimality, robustness, and relationship with kernel ridge regression

161 - Wenjia Wang , Bing-Yi Jing 2021

In this work, we investigate Gaussian process regression used to recover a function based on noisy observations. We derive upper and lower error bounds for Gaussian process regression with possibly misspecified correlation functions. The optimal convergence rate can be attained even if the smoothness of the imposed correlation function exceeds that of the true correlation function and the sampling scheme is quasi-uniform. As byproducts, we also obtain convergence rates of kernel ridge regression with misspecified kernel function, where the underlying truth is a deterministic function. The convergence rates of Gaussian process regression and kernel ridge regression are closely connected, which is aligned with the relationship between sample paths of Gaussian process and the corresponding reproducing kernel Hilbert space.

Statistics Theory Statistics Theory

Debiasing Stochastic Gradient Descent to handle missing values

406 - Julie Josse , Claire Boyern (LPSM UMR 8001 2020

Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning.However, a major caveat of large data is their incompleteness.We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion.In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of $mathcal{O}(frac{1}{n})$ at the iteration $n$, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.

Statistics Theory Statistics Theory

On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective

64 - Tom Tirer , Raja Giryes 2020

Ill-posed linear inverse problems appear in many scientific setups, and are typically addressed by solving optimization problems, which are composed of data fidelity and prior terms. Recently, several works have considered a back-projection (BP) based fidelity term as an alternative to the common least squares (LS), and demonstrated excellent results for popular inverse problems. These works have also empirically shown that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms. In this paper, we examine the convergence rate of the projected gradient descent (PGD) algorithm for the BP objective. Our analysis allows to identify an inherent source for its faster convergence compared to using the LS objective, while making only mild assumptions. We also analyze the more general proximal gradient method under a relaxed contraction condition on the proximal mapping of the prior. This analysis further highlights the advantage of BP when the linear measurement operator is badly conditioned. Numerical experiments with both $ell_1$-norm and GAN-based priors corroborate our theoretical results.

Optimization and Control Computer Vision and Pattern Recognition Machine Learning

Convergence of least squares estimators in the adaptive Wynn algorithm for a class of nonlinear regression models

116 - Fritjof Freise , Norbert Gaffke , Rainer Schwabe 2019

The paper continues the authors work on the adaptive Wynn algorithm in a nonlinear regression model. In the present paper it is shown that if the mean response function satisfies a condition of `saturated identifiability, which was introduced by Pronzato cite{Pronzato}, then the adaptive least squares estimators are strongly consistent. The condition states that the regression parameter is identifiable under any saturated design, i.e., the values of the mean response function at any $p$ distinct design points determine the parameter point uniquely where, typically, $p$ is the dimension of the regression parameter vector. Further essential assumptions are compactness of the experimental region and of the parameter space together with some natural continuity assumptions. If the true parameter point is an interior point of the parameter space then under some smoothness assumptions and asymptotic homoscedasticity of random errors the asymptotic normality of adaptive least squares estimators is obtained.

Statistics Theory Statistics Theory

comments

Fetching comments

Oran 1 University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On the rate of convergence of a neural network regression estimate learned by gradient descent

Ask ChatGPT about the research

No Arabic abstract

Read More