ﻻ يوجد ملخص باللغة العربية
When optimizing over loss functions it is common practice to use momentum-based accelerated methods rather than vanilla gradient-based method. Despite widely applied to arbitrary loss function, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work we used dynamical mean field theory techniques to describe analytically the average behaviour of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviours of several algorithms including heavy-ball momentum and Nesterov acceleration. Additionally we characterize the evolution of a mathematically equivalent physical system of massive particles relaxing toward the bottom of an energetic landscape. Under the correct mapping the two dynamics are equivalent and it can be noticed that having a large mass increases the effective time step of the heavy ball dynamics leading to a speed up.
We consider high dimensional random optimization problems where the dynamical variables are subjected to non-convex excluded volume constraints. We focus on the case in which the cost function is a simple quadratic cost and the excluded volume constr
In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potenti
Momentum methods such as Polyaks heavy ball (HB) method, Nesterovs accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to nois
We propose a new framework for deriving screening rules for convex optimization problems. Our approach covers a large class of constrained and penalized optimization formulations, and works in two steps. First, given any approximate point, the struct
Stochastic gradient descent~(SGD) and its variants have attracted much attention in machine learning due to their efficiency and effectiveness for optimization. To handle large-scale problems, researchers have recently proposed several lock-free stra