ﻻ يوجد ملخص باللغة العربية
This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the `flat geometry around saddle points, first-order methods can struggle in escaping these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing literature does not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions for a class of strict-saddle functions.
Motivated by broad applications in reinforcement learning and machine learning, this paper considers the popular stochastic gradient descent (SGD) when the gradients of the underlying objective function are sampled from Markov processes. This Markov
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) dont increase the stepsize too fast and 2) dont overstep the local curvature. No need for functional values, no line search, no information about the
This paper develops further the idea of perturbed gradient descent (PGD), by adapting perturbation with the history of states via the notion of occupation time. The proposed algorithm, perturbed gradient descent adapted with occupation time (PGDOT),
Convergence of the gradient descent algorithm has been attracting renewed interest due to its utility in deep learning applications. Even as multiple variants of gradient descent were proposed, the assumption that the gradient of the objective is Lip
We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $tau$ rounds ago. First, we show that without stochastic noise, dela