ﻻ يوجد ملخص باللغة العربية
In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototypical highly non-convex example. We observe that for phase retrieval the stochastic variants of gradient descent are able to reach perfect generalization for regions of control parameters where the gradient descent algorithm is not. We apply dynamical mean-field theory from statistical physics to characterize analytically the full trajectories of these algorithms in their continuous-time limit, with a warm start, and for large system sizes. We further unveil several intriguing properties of the landscape and the algorithms such as that the gradient descent can obtain better generalization properties from less informed initializations.
Using a non-thermal local search, called Extremal Optimization (EO), in conjunction with a recently developed scheme for classifying the valley structure of complex systems, we analyze a short-range spin glass. In comparison with earlier studies usin
The conjugate gradient (CG) method, a standard and vital way of minimizing the energy of a variational state, is applied to solve several problems in Skyrmion physics. The single-Skyrmion profile optimizing the energy of a two-dimensional chiral magn
We study the scaling properties of the solid-on-solid front of the infinite cluster in two-dimensional gradient percolation. We show that such an object is self affine with a Hurst exponent equal to 2/3 up to a cutoff-length proportional to the gradi
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of de
We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $langle A_t, theta_starrangle^2$ where $theta_star in mathbb R^d$ is an unknown parameter vecto