New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

62 0 0.0 ( 0 )

Download Cite

Added by Francesca Mignacco

Publication date 2021

fields Physics Informatics Engineering

and research's language is English

Authors Francesca Mignacco - Pierfrancesco Urbani - Lenka Zdeborova

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototypical highly non-convex example. We observe that for phase retrieval the stochastic variants of gradient descent are able to reach perfect generalization for regions of control parameters where the gradient descent algorithm is not. We apply dynamical mean-field theory from statistical physics to characterize analytically the full trajectories of these algorithms in their continuous-time limit, with a warm start, and for large system sizes. We further unveil several intriguing properties of the landscape and the algorithms such as that the gradient descent can obtain better generalization properties from less informed initializations.

rate research

Comparing extremal and thermal Explorations of Energy Landscapes

49 - Stefan Boettcher 2004

Using a non-thermal local search, called Extremal Optimization (EO), in conjunction with a recently developed scheme for classifying the valley structure of complex systems, we analyze a short-range spin glass. In comparison with earlier studies using a thermal algorithm with detailed balance, we determine which features of the landscape are algorithm dependent and which are inherently geometrical. Apparently a characteristic for any local search in complex energy landscapes, the time series of successive energy records found by EO also is characterized approximately by a log-Poisson statistics. Differences in the results provide additional insights into the performance of EO. In contrast with a thermal search, the extremal search visits dramatically higher energies while returning to more widely separated low-energy configurations. Two important properties of the energy landscape are independent of either algorithm: first, to find lower energy records, progressively higher energy barriers need to be overcome. Second, the Hamming distance between two consecutive low-energy records is linearly related to the height of the intervening barrier.

Disordered Systems and Neural Networks

Applications of gradient descent method to magnetic Skyrmion problems

109 - Jung Hoon Han , Manhyung Han 2017

The conjugate gradient (CG) method, a standard and vital way of minimizing the energy of a variational state, is applied to solve several problems in Skyrmion physics. The single-Skyrmion profile optimizing the energy of a two-dimensional chiral magnet is found without relying on specific boundary conditions. The two-dimensional Skyrmion lattice and three-dimensional hedgehog crystal state is recovered with efficiency using the modified CG (p-GD) method. The p-GD method is proposed as a complement to the traditional Monte Carlo annealing method, which still gives better results for the ground state but at the far greater cost in computation time.

Disordered Systems and Neural Networks Statistical Mechanics

Self-Affinity in the Gradient Percolation Problem

84 - Alex Hansen , G. George Batrouni , Thomas Ramstad 2005

We study the scaling properties of the solid-on-solid front of the infinite cluster in two-dimensional gradient percolation. We show that such an object is self affine with a Hurst exponent equal to 2/3 up to a cutoff-length proportional to the gradient to the power (-4/7). Beyond this length scale, the front position has the character of uncorrelated noise. Importantly, the self-affine behavior is robust even after removing local jumps of the front. The previously observed multi affinity, is due to the dominance of overhangs at small distances in the structure function. This is a crossover effect.

Disordered Systems and Neural Networks Statistical Mechanics

Learning to Learn without Gradient Descent by Gradient Descent

276 - Yutian Chen , Matthew W. Hoffman , Sergio Gomez Colmenarejo 2016

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

Machine Learning Machine Learning

Bandit Phase Retrieval

87 - Tor Lattimore , Botao Hao 2021

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $langle A_t, theta_starrangle^2$ where $theta_star in mathbb R^d$ is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is $smash{tilde Theta(d sqrt{n})}$, which improves on the best known bounds by a factor of $smash{sqrt{d}}$. We also show that the minimax simple regret is $smash{tilde Theta(d / sqrt{n})}$ and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.

Machine Learning Machine Learning Statistics Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions