بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Minghan Yang

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Minghan Yang - Dong Xu - Hongyu Chen

التحسين والتحكم التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.

قيم البحث

124 - Luo Luo , Zihao Chen , Zhihua Zhang 2016

In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-s mooth part is equipped with a simple proximal mapping. We propose a proximal stochastic second-order method, which is efficient and scalable. It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient. We prove that our method can achieve linear rate of convergence.

التعلم الآلي التعلم الالي

Sub-sampled Newton Methods with Non-uniform Sampling

183 - Peng Xu , Jiyan Yang , Farbod Roosta-Khorasani 2016

We consider the problem of finding the minimizer of a convex function $F: mathbb R^d rightarrow mathbb R$ of the form $F(w) := sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $ abla^2 f_i(w)$ is readily available. We consider the regime where $n gg d$. As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit textit{non-uniform} sub-sampling of ${ abla^2 f_i(w)}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity. Two non-uniform sampling distributions based on {it block norm squares} and {it block partial leverage scores} are considered in order to capture important terms among ${ abla^2 f_i(w)}_{i=1}^{n}$. We show that at each iteration non-uniformly sampling at most $mathcal O(d log d)$ terms from ${ abla^2 f_i(w)}_{i=1}^{n}$ is sufficient to achieve a linear-quadratic convergence rate in $w$ when a suitable initial point is provided. In addition, we show that our algorithms achieve a lower computational complexity and exhibit more robustness and better dependence on problem specific quantities, such as the condition number, compared to similar existing methods, especially the ones based on uniform sampling. Finally, we empirically demonstrate that our methods are at least twice as fast as Newtons methods with ridge logistic regression on several real datasets.

التحسين والتحكم التعلم الالي

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods

87 - Albert S. Berahas , Richard H. Byrd , Jorge Nocedal 2018

This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval $h $ based on the noise estimation techniques of Hamming (2012) and More and Wild (2011). This noise estimation procedure and the selection of $h$ are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a function interpolating trust region method are presented.

التحسين والتحكم

Some New Proximal Quasi-Newton Methods for Multiobjective Optimization Problems

115 - Jian-Wen Peng , Jie Ren 2021

In this paper, we propose some new proximal quasi-Newton methods with line search or without line search for a special class of nonsmooth multiobjective optimization problems, where each objective function is the sum of a twice continuously different iable strongly convex function and a proper convex but not necessarily differentiable function. In these new proximal quasi-Newton methods, we approximate the Hessian matrices by using the well known BFGS, self-scaling BFGS, and the Huang BFGS method. We show that each accumulation point of the sequence generated by these new algorithms is a Pareto stationary point of the multiobjective optimization problem. In addition, we give their applications in robust multiobjective optimization, and we show that the subproblems of proximal quasi-Newton algorithms can be regarded as quadratic programming problems. Numerical experiments are carried out to verify the effectiveness of the proposed method.

التحسين والتحكم

Scaling Up Quasi-Newton Algorithms: Communication Efficient Distributed SR1

80 - Majid Jahani , Mohammadreza Nazari , Sergey Rusakov 2019

In this paper, we present a scalable distributed implementation of the Sampled Limited-memory Symmetric Rank-1 (S-LSR1) algorithm. First, we show that a naive distributed implementation of S-LSR1 requires multiple rounds of expensive communications a t every iteration and thus is inefficient. We then propose DS-LSR1, a communication-efficient variant that: (i) drastically reduces the amount of data communicated at every iteration, (ii) has favorable work-load balancing across nodes, and (iii) is matrix-free and inverse-free. The proposed method scales well in terms of both the dimension of the problem and the number of data points. Finally, we illustrate the empirical performance of DS-LSR1 on a standard neural network training task.

التحسين والتحكم التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة المأمون الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً