Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Francesca Mignacco

تاريخ النشر 2021

مجال البحث فيزياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Francesca Mignacco - Pierfrancesco Urbani - Lenka Zdeborova

الأنظمة المضطربة والشبكات العصبية التعلم الآلي نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototypical highly non-convex example. We observe that for phase retrieval the stochastic variants of gradient descent are able to reach perfect generalization for regions of control parameters where the gradient descent algorithm is not. We apply dynamical mean-field theory from statistical physics to characterize analytically the full trajectories of these algorithms in their continuous-time limit, with a warm start, and for large system sizes. We further unveil several intriguing properties of the landscape and the algorithms such as that the gradient descent can obtain better generalization properties from less informed initializations.

قيم البحث

49 - Stefan Boettcher 2004

Using a non-thermal local search, called Extremal Optimization (EO), in conjunction with a recently developed scheme for classifying the valley structure of complex systems, we analyze a short-range spin glass. In comparison with earlier studies usin g a thermal algorithm with detailed balance, we determine which features of the landscape are algorithm dependent and which are inherently geometrical. Apparently a characteristic for any local search in complex energy landscapes, the time series of successive energy records found by EO also is characterized approximately by a log-Poisson statistics. Differences in the results provide additional insights into the performance of EO. In contrast with a thermal search, the extremal search visits dramatically higher energies while returning to more widely separated low-energy configurations. Two important properties of the energy landscape are independent of either algorithm: first, to find lower energy records, progressively higher energy barriers need to be overcome. Second, the Hamming distance between two consecutive low-energy records is linearly related to the height of the intervening barrier.

الأنظمة المضطربة والشبكات العصبية

Applications of gradient descent method to magnetic Skyrmion problems

109 - Jung Hoon Han , Manhyung Han 2017

The conjugate gradient (CG) method, a standard and vital way of minimizing the energy of a variational state, is applied to solve several problems in Skyrmion physics. The single-Skyrmion profile optimizing the energy of a two-dimensional chiral magn et is found without relying on specific boundary conditions. The two-dimensional Skyrmion lattice and three-dimensional hedgehog crystal state is recovered with efficiency using the modified CG (p-GD) method. The p-GD method is proposed as a complement to the traditional Monte Carlo annealing method, which still gives better results for the ground state but at the far greater cost in computation time.

الأنظمة المضطربة والشبكات العصبية الميكانيكا الإحصائية

Self-Affinity in the Gradient Percolation Problem

84 - Alex Hansen , G. George Batrouni , Thomas Ramstad 2005

We study the scaling properties of the solid-on-solid front of the infinite cluster in two-dimensional gradient percolation. We show that such an object is self affine with a Hurst exponent equal to 2/3 up to a cutoff-length proportional to the gradi ent to the power (-4/7). Beyond this length scale, the front position has the character of uncorrelated noise. Importantly, the self-affine behavior is robust even after removing local jumps of the front. The previously observed multi affinity, is due to the dominance of overhangs at small distances in the structure function. This is a crossover effect.

الأنظمة المضطربة والشبكات العصبية الميكانيكا الإحصائية

Learning to Learn without Gradient Descent by Gradient Descent

276 - Yutian Chen , Matthew W. Hoffman , Sergio Gomez Colmenarejo 2016

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of de rivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

التعلم الالي التعلم الآلي

Bandit Phase Retrieval

87 - Tor Lattimore , Botao Hao 2021

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $langle A_t, theta_starrangle^2$ where $theta_star in mathbb R^d$ is an unknown parameter vecto r. We prove that the minimax cumulative regret in this problem is $smash{tilde Theta(d sqrt{n})}$, which improves on the best known bounds by a factor of $smash{sqrt{d}}$. We also show that the minimax simple regret is $smash{tilde Theta(d / sqrt{n})}$ and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.

التعلم الالي التعلم الآلي نظرية الإحصاء