ﻻ يوجد ملخص باللغة العربية
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-delta _n abla f(x_n)$ it usually is required that the learning rates $delta _n$s are bounded: $delta _nleq delta $ for some positive $delta $. Under this assumption, if the sequence $x_n$ converges to a critical point $z$, then with large values of $n$ the update will be small because $||x_{n+1}-x_n||lesssim || abla f(x_n)||$. This may also force the sequence to converge to a bad minimum. If we can allow, at least theoretically, that the learning rates $delta _n$s are not bounded, then we may have better convergence to better minima. A previous joint paper by the author showed convergence for the usual version of Backtracking GD under very general assumptions on the cost function $f$. In this paper, we allow the learning rates $delta _n$ to be unbounded, in the sense that there is a function $h:(0,infty)rightarrow (0,infty )$ such that $lim _{trightarrow 0}th(t)=0$ and $delta _nlesssim max {h(x_n),delta }$ satisfies Armijos condition for all $n$, and prove convergence under the same assumptions as in the mentioned paper. It will be shown that this growth rate of $h$ is best possible if one wants convergence of the sequence ${x_n}$. A specific way for choosing $delta _n$ in a discrete way connects to Two-way Backtracking GD defined in the mentioned paper. We provide some results which either improve or are implicitly contained in those in the mentioned paper and another recent paper on avoidance of saddle points.
Our main result concerns the following condition: {bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:Xrightarrow mathbb{R}$ satisfies Condition C if whenever ${x_n}$ weakly converges to $x$ and $lim _{nrightarrowinfty}|| abla f(x_n)||
Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) dont increase the stepsize too fast and 2) dont overstep the local curvature. No need for functional values, no line search, no information about the
Distributed descent-based methods are an essential toolset to solving optimization problems in multi-agent system scenarios. Here the agents seek to optimize a global objective function through mutual cooperation. Oftentimes, cooperation is achieved
Conditional Stochastic Optimization (CSO) covers a variety of applications ranging from meta-learning and causal inference to invariant learning. However, constructing unbiased gradient estimates in CSO is challenging due to the composition structure