بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Block BFGS Methods

100 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wenbo Gao

تاريخ النشر 2016

مجال البحث

والبحث باللغة English

تأليف Wenbo Gao - Donald Goldfarb

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce a quasi-Newton method with block updates called Block BFGS. We show that this method, performed with inexact Armijo-Wolfe line searches, converges globally and superlinearly under the same convexity assumptions as BFGS. We also show that Block BFGS is globally convergent to a stationary point when applied to non-convex functions with bounded Hessian, and discuss other modifications for non-convex minimization. Numerical experiments comparing Block BFGS, BFGS and gradient descent are presented.

قيم البحث

467 - Cong D. Dang , Guanghui Lan 2013

In this paper, we present a new stochastic algorithm, namely the stochastic block mirror descent (SBMD) method for solving large-scale nonsmooth and stochastic optimization problems. The basic idea of this algorithm is to incorporate the block-coordi nate decomposition and an incremental block averaging scheme into the classic (stochastic) mirror-descent method, in order to significantly reduce the cost per iteration of the latter algorithm. We establish the rate of convergence of the SBMD method along with its associated large-deviation results for solving general nonsmooth and stochastic optimization problems. We also introduce different variants of this method and establish their rate of convergence for solving strongly convex, smooth, and composite optimization problems, as well as certain nonconvex optimization problems. To the best of our knowledge, all these developments related to the SBMD methods are new in the stochastic optimization literature. Moreover, some of our results also seem to be new for block coordinate descent methods for deterministic optimization.

التحسين والتحكم

BFGS convergence to nonsmooth minimizers of convex functions

157 - Jiayi Guo , Adrian Lewis 2017

The popular BFGS quasi-Newton minimization algorithm under reasonable conditions converges globally on smooth convex functions. This result was proved by Powell in 1976: we consider its implications for functions that are not smooth. In particular, a n analogous convergence result holds for functions, like the Euclidean norm, that are nonsmooth at the minimizer.

التحسين والتحكم

A Progressive Batching L-BFGS Method for Machine Learning

85 - Raghu Bollapragada , Dheevatsa Mudigere , Jorge Nocedal 2018

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective fun ction. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

التحسين والتحكم التعلم الآلي التعلم الالي

Zeroth-order randomized block methods for constrained minimization of expectation-valued Lipschitz continuous functions

203 - Uday V. Shanbhag , Farzad Yousefian 2021

We consider the minimization of an $L_0$-Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(x)triangleq mathbb{E}[tilde{f}(x,omega)]$, over a Cartesian product of closed and convex sets with a view towards obtainin g both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense). We adopt a smoothing-based approach reliant on minimizing $f_{eta}$ where $f_{eta}(x) triangleq mathbb{E}_{u}[f(x+eta u)]$, $u$ is a random variable defined on a unit sphere, and $eta > 0$. In fact, it is observed that a stationary point of the $eta$-smoothed problem is a $2eta$-stationary point for the original problem in the Clarke sense. In such a setting, we derive a suitable residual function that provides a metric for stationarity for the smoothed problem. By leveraging a zeroth-order framework reliant on utilizing sampled function evaluations implemented in a block-structured regime, we make two sets of contributions for the sequence generated by the proposed scheme. (i) The residual function of the smoothed problem tends to zero almost surely along the generated sequence; (ii) To compute an $x$ that ensures that the expected norm of the residual of the $eta$-smoothed problem is within $epsilon$ requires no greater than $mathcal{O}(tfrac{1}{eta epsilon^2})$ projection steps and $mathcal{O}left(tfrac{1}{eta^2 epsilon^4}right)$ function evaluations. These statements appear to be novel and there appear to be few results to contend with general nonsmooth, nonconvex, and stochastic regimes via zeroth-order approaches.

التحسين والتحكم

Markov Chain Block Coordinate Descent

477 - Tao Sun , Yuejiao Sun , Yangyang Xu 2018

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block s election is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

التحسين والتحكم التعلم الآلي التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة طرطوس

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Block BFGS Methods

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً