ترغب بنشر مسار تعليمي؟ اضغط هنا

Coordinate-wise Armijos condition: General case

52   0   0.0 ( 0 )
 نشر من قبل Tuyen Truong
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Let $z=(x,y)$ be coordinates for the product space $mathbb{R}^{m_1}times mathbb{R}^{m_2}$. Let $f:mathbb{R}^{m_1}times mathbb{R}^{m_2}rightarrow mathbb{R}$ be a $C^1$ function, and $ abla f=(partial _xf,partial _yf)$ its gradient. Fix $0<alpha <1$. For a point $(x,y) in mathbb{R}^{m_1}times mathbb{R}^{m_2}$, a number $delta >0$ satisfies Armijos condition at $(x,y)$ if the following inequality holds: begin{eqnarray*} f(x-delta partial _xf,y-delta partial _yf)-f(x,y)leq -alpha delta (||partial _xf||^2+||partial _yf||^2). end{eqnarray*} In one previous paper, we proposed the following {bf coordinate-wise} Armijos condition. Fix again $0<alpha <1$. A pair of positive numbers $delta _1,delta _2>0$ satisfies the coordinate-wise variant of Armijos condition at $(x,y)$ if the following inequality holds: begin{eqnarray*} [f(x-delta _1partial _xf(x,y), y-delta _2partial _y f(x,y))]-[f(x,y)]leq -alpha (delta _1||partial _xf(x,y)||^2+delta _2||partial _yf(x,y)||^2). end{eqnarray*} Previously we applied this condition for functions of the form $f(x,y)=f(x)+g(y)$, and proved various convergent results for them. For a general function, it is crucial - for being able to do real computations - to have a systematic algorithm for obtaining $delta _1$ and $delta _2$ satisfying the coordinate-wise version of Armijos condition, much like Backtracking for the usual Armijos condition. In this paper we propose such an algorithm, and prove according convergent results. We then analyse and present experimental results for some functions such as $f(x,y)=a|x|+y$ (given by Asl and Overton in connection to Wolfes method), $f(x,y)=x^3 sin (1/x) + y^3 sin(1/y)$ and Rosenbrocks function.



قيم البحث

اقرأ أيضاً

132 - Tuyen Trung Truong 2019
Let $z=(x,y)$ be coordinates for the product space $mathbb{R}^{m_1}times mathbb{R}^{m_2}$. Let $f:mathbb{R}^{m_1}times mathbb{R}^{m_2}rightarrow mathbb{R}$ be a $C^1$ function, and $ abla f=(partial _xf,partial _yf)$ its gradient. Fix $0<alpha <1$. F or a point $(x,y) in mathbb{R}^{m_1}times mathbb{R}^{m_2}$, a number $delta >0$ satisfies Armijos condition at $(x,y)$ if the following inequality holds: begin{eqnarray*} f(x-delta partial _xf,y-delta partial _yf)-f(x,y)leq -alpha delta (||partial _xf||^2+||partial _yf||^2). end{eqnarray*} When $f(x,y)=f_1(x)+f_2(y)$ is a coordinate-wise sum map, we propose the following {bf coordinate-wise} Armijos condition. Fix again $0<alpha <1$. A pair of positive numbers $delta _1,delta _2>0$ satisfies the coordinate-wise variant of Armijos condition at $(x,y)$ if the following inequality holds: begin{eqnarray*} [f_1(x-delta _1 abla f_1(x))+f_2(y-delta _2 abla f_2(y))]-[f_1(x)+f_2(y)]leq -alpha (delta _1|| abla f_1(x)||^2+delta _2|| abla f_2(y)||^2). end{eqnarray*} We then extend results in our recent previous results, on Backtracking Gradient Descent and some variants, to this setting. We show by an example the advantage of using coordinate-wise Armijos condition over the usual Armijos condition.
Fix a constant $0<alpha <1$. For a $C^1$ function $f:mathbb{R}^krightarrow mathbb{R}$, a point $x$ and a positive number $delta >0$, we say that Armijos condition is satisfied if $f(x-delta abla f(x))-f(x)leq -alpha delta || abla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm. Consider a sequence ${x_n}$ defined by $x_{n+1}=x_n-delta _n abla f(x_n)$, for positive numbers $delta _n$ for which Armijos condition is satisfied. We show that if ${x_n}$ converges to a non-degenerate critical point, then ${delta _n}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $ abla ^2f$ and its inverse at the limit point. This complements the first authors results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the behaviour of Unbounded Backtracking GD is not too different from that of usual Backtracking GD. On the other hand, in case of convergence to a degenerate critical point the behaviours can be very much different. We run some experiments to illustrate that both scenrios can really happen. In another part of the paper, we argue that Backtracking GD has the correct unit (according to a definition by Zeiler in his Adadeltas paper). The main point is that since learning rate in Backtracking GD is bound by Armijos condition, it is not unitless.
The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block s election is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.
In this work, we analyze the global convergence property of coordinate gradient descent with random choice of coordinates and stepsizes for non-convex optimization problems. Under generic assumptions, we prove that the algorithm iterate will almost s urely escape strict saddle points of the objective function. As a result, the algorithm is guaranteed to converge to local minima if all saddle points are strict. Our proof is based on viewing coordinate descent algorithm as a nonlinear random dynamical system and a quantitative finite block analysis of its linearization around saddle points.
85 - Ganzhao Yuan 2021
Difference-of-Convex (DC) minimization, referring to the problem of minimizing the difference of two convex functions, has been found rich applications in statistical learning and studied extensively for decades. However, existing methods are primari ly based on multi-stage convex relaxation, only leading to weak optimality of critical points. This paper proposes a coordinate descent method for minimizing DC functions based on sequential nonconvex approximation. Our approach iteratively solves a nonconvex one-dimensional subproblem globally, and it is guaranteed to converge to a coordinate-wise stationary point. We prove that this new optimality condition is always stronger than the critical point condition and the directional point condition when the objective function is weakly convex. For comparisons, we also include a naive variant of coordinate descent methods based on sequential convex approximation in our study. When the objective function satisfies an additional regularity condition called emph{sharpness}, coordinate descent methods with an appropriate initialization converge emph{linearly} to the optimal solution set. Also, for many applications of interest, we show that the nonconvex one-dimensional subproblem can be computed exactly and efficiently using a breakpoint searching method. We present some discussions and extensions of our proposed method. Finally, we have conducted extensive experiments on several statistical learning tasks to show the superiority of our approach. Keywords: Coordinate Descent, DC Minimization, DC Programming, Difference-of-Convex Programs, Nonconvex Optimization, Sparse Optimization, Binary Optimization.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا