ﻻ يوجد ملخص باللغة العربية
Fix a constant $0<alpha <1$. For a $C^1$ function $f:mathbb{R}^krightarrow mathbb{R}$, a point $x$ and a positive number $delta >0$, we say that Armijos condition is satisfied if $f(x-delta abla f(x))-f(x)leq -alpha delta || abla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm. Consider a sequence ${x_n}$ defined by $x_{n+1}=x_n-delta _n abla f(x_n)$, for positive numbers $delta _n$ for which Armijos condition is satisfied. We show that if ${x_n}$ converges to a non-degenerate critical point, then ${delta _n}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $ abla ^2f$ and its inverse at the limit point. This complements the first authors results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the behaviour of Unbounded Backtracking GD is not too different from that of usual Backtracking GD. On the other hand, in case of convergence to a degenerate critical point the behaviours can be very much different. We run some experiments to illustrate that both scenrios can really happen. In another part of the paper, we argue that Backtracking GD has the correct unit (according to a definition by Zeiler in his Adadeltas paper). The main point is that since learning rate in Backtracking GD is bound by Armijos condition, it is not unitless.
Let $z=(x,y)$ be coordinates for the product space $mathbb{R}^{m_1}times mathbb{R}^{m_2}$. Let $f:mathbb{R}^{m_1}times mathbb{R}^{m_2}rightarrow mathbb{R}$ be a $C^1$ function, and $ abla f=(partial _xf,partial _yf)$ its gradient. Fix $0<alpha <1$. F
Let $z=(x,y)$ be coordinates for the product space $mathbb{R}^{m_1}times mathbb{R}^{m_2}$. Let $f:mathbb{R}^{m_1}times mathbb{R}^{m_2}rightarrow mathbb{R}$ be a $C^1$ function, and $ abla f=(partial _xf,partial _yf)$ its gradient. Fix $0<alpha <1$. F
We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Lang
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-delta _n abla f(x_n)$ it usually is required that the learning rates $delta _n$s are bounded: $delta _nleq delta $ for some pos
In this paper, the problem of safe global maximization (it should not be confused with robust optimization) of expensive noisy black-box functions satisfying the Lipschitz condition is considered. The notion safe means that the objective function $f(