ﻻ يوجد ملخص باللغة العربية
Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to better performance in certain settings, their inner workings remain a mystery. How is a learned optimizer able to outperform a well tuned baseline? Has it learned a sophisticated combination of existing optimization techniques, or is it implementing completely new behavior? In this work, we address these questions by careful analysis and visualization of learned optimizers. We study learned optimizers trained from scratch on three disparate tasks, and discover that they have learned interpretable mechanisms, including: momentum, gradient clipping, learning rate schedules, and a new form of learning rate adaptation. Moreover, we show how the dynamics of learned optimizers enables these behaviors. Our results help elucidate the previously murky understanding of how learned optimizers work, and establish tools for interpreting future learned optimizers.
Learned optimizers are increasingly effective, with performance exceeding that of hand designed optimizers such as Adam~citep{kingma2014adam} on specific tasks citep{metz2019understanding}. Despite the potential gains available, in current work the m
Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of
Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpreta
Traditional maximum entropy and sparsity-based algorithms for analytic continuation often suffer from the ill-posed kernel matrix or demand tremendous computation time for parameter tuning. Here we propose a neural network method by convex optimizati
It has been widely assumed that a neural network cannot be recovered from its outputs, as the network depends on its parameters in a highly nonlinear way. Here, we prove that in fact it is often possible to identify the architecture, weights, and bia