ﻻ يوجد ملخص باللغة العربية
Our main result concerns the following condition: {bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:Xrightarrow mathbb{R}$ satisfies Condition C if whenever ${x_n}$ weakly converges to $x$ and $lim _{nrightarrowinfty}|| abla f(x_n)||=0$, then $ abla f(x)=0$. We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a Hilbert space. {bf Theorem.} Let $X$ be a reflexive, complete Banach space and $f:Xrightarrow mathbb{R}$ be a $C^2$ function which satisfies Condition C. Moreover, we assume that for every bounded set $Ssubset X$, then $sup _{xin S}|| abla ^2f(x)||<infty$. We choose a random point $x_0in X$ and construct by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters $alpha ,beta ,delta _0$, see later for details) the sequence $x_{n+1}=x_n-delta (x_n) abla f(x_n)$. Then we have: 1) Every cluster point of ${x_n}$, in the {bf weak} topology, is a critical point of $f$. 2) Either $lim _{nrightarrowinfty}f(x_n)=-infty$ or $lim _{nrightarrowinfty}||x_{n+1}-x_n||=0$. 3) Here we work with the weak topology. Let $mathcal{C}$ be the set of critical points of $f$. Assume that $mathcal{C}$ has a bounded component $A$. Let $mathcal{B}$ be the set of cluster points of ${x_n}$. If $mathcal{B}cap A ot= emptyset$, then $mathcal{B}subset A$ and $mathcal{B}$ is connected. 4) Assume that $X$ is separable. Then for generic choices of $alpha ,beta ,delta _0$ and the initial point $x_0$, if the sequence ${x_n}$ converges - in the {bf weak} topology, then the limit point cannot be a saddle point.
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-delta _n abla f(x_n)$ it usually is required that the learning rates $delta _n$s are bounded: $delta _nleq delta $ for some pos
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) dont increase the stepsize too fast and 2) dont overstep the local curvature. No need for functional values, no line search, no information about the
Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they
Bayesian inference problems require sampling or approximating high-dimensional probability distributions. The focus of this paper is on the recently introduced Stein variational gradient descent methodology, a class of algorithms that rely on iterate
Assume that $mathcal{I}$ is an ideal on $mathbb{N}$, and $sum_n x_n$ is a divergent series in a Banach space $X$. We study the Baire category, and the measure of the set $A(mathcal{I}):=left{t in {0,1}^{mathbb{N}} colon sum_n t(n)x_n textrm{ is } mat