No Arabic abstract
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. Despite their wide applicability, solving such problems efficiently and effectively is challenging in the presence of large amounts of data using existing stochastic minimax methods. We study a class of stochastic minimax methods and develop a communication-efficient distributed stochastic extragradient algorithm, LocalAdaSEG, with an adaptive learning rate suitable for solving convex-concave minimax problem in the Parameter-Server model. LocalAdaSEG has three main features: (i) periodic communication strategy reduces the communication cost between workers and the server; (ii) an adaptive learning rate that is computed locally and allows for tuning-free implementation; and (iii) theoretically, a nearly linear speed-up with respect to the dominant variance term, arising from estimation of the stochastic gradient, is proven in both the smooth and nonsmooth convex-concave settings. LocalAdaSEG is used to solve a stochastic bilinear game, and train generative adversarial network. We compare LocalAdaSEG against several existing optimizers for minimax problems and demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
We consider a general class of nonconvex-PL minimax problems in the cross-device federated learning setting. Although nonconvex-PL minimax problems have received a lot of interest in recent years, existing algorithms do not apply to the cross-device federated learning setting which is substantially different from conventional distributed settings and poses new challenges. To bridge this gap, we propose an algorithmic framework named FedSGDA. FedSGDA performs multiple local update steps on a subset of active clients in each round and leverages global gradient estimates to correct the bias in local update directions. By incorporating FedSGDA with two representative global gradient estimators, we obtain two specific algorithms. We establish convergence rates of the proposed algorithms by using novel potential functions. Experimental results on synthetic and real data corroborate our theory and demonstrate the effectiveness of our algorithms.
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.
We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints. Many well-studied problems like online stochastic packing and covering, online stochastic matching with concave returns, etc. form a special case of online stochastic CP. We present fast algorithms for these problems, which achieve near-optimal regret guarantees for both the i.i.d. and the random permutation models of stochastic inputs. When applied to the special case online packing, our ideas yield a simpler and faster primal-dual algorithm for this well studied problem, which achieves the optimal competitive ratio. Our techniques make explicit the connection of primal-dual paradigm and online learning to online stochastic CP.
We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $alpha$-fraction of the machines are $textit{Byzantine}$, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging $textit{non-convex}$ case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks.
Minimax optimization problems are an important class of optimization problems arising from modern machine learning and traditional research areas. While there have been many numerical algorithms for solving smooth convex-concave minimax problems, numerical algorithms for nonsmooth convex-concave minimax problems are very rare. This paper aims to develop an efficient numerical algorithm for a structured nonsmooth convex-concave minimax problem. A majorized semi-proximal alternating coordinate method (mspACM) is proposed, in which a majorized quadratic convex-concave function is adopted for approximating the smooth part of the objective function and semi-proximal terms are added in each subproblem. This construction enables the subproblems at each iteration are solvable and even easily solved when the semiproximal terms are cleverly chosen. We prove the global convergence of the algorithm mspACM under mild assumptions, without requiring strong convexity-concavity condition. Under the locally metrical subregularity of the solution mapping, we prove that the algorithm mspACM has the linear rate of convergence. Preliminary numerical results are reported to verify the efficiency of the algorithm mspACM.