ﻻ يوجد ملخص باللغة العربية
Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization rules are carefully designed by exploiting the nature of the problem class. As a simple alternative to hand-crafted initialization rules, we propose an approach for learning good initialization rules from previous solutions. We provide theoretical guarantees that establish conditions that are sufficient in all cases and also necessary in some under which our approach performs better than random initialization. We apply our methodology to various non-convex problems such as generating adversarial examples, generating post hoc explanations for black-box machine learning models, and allocating communication spectrum, and show consistent gains over other initialization techniques.
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of de
Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizers hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introd
The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in stee
Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural net
The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context