ﻻ يوجد ملخص باللغة العربية
We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.
We give nearly matching upper and lower bounds on the oracle complexity of finding $epsilon$-stationary points ($| abla F(x) | leqepsilon$) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic orac
We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objecti
Recent work has shown how to embed differentiable optimization problems (that is, problems whose solutions can be backpropagated through) as layers within deep learning architectures. This method provides a useful inductive bias for certain problems,
One popular trend in meta-learning is to learn from many training tasks a common initialization for a gradient-based method that can be used to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several
Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster