No Arabic abstract
This paper explores the behavior of present-biased agents, that is, agents who erroneously anticipate the costs of future actions compared to their real costs. Specifically, the paper extends the original framework proposed by Akerlof (1991) for studying various aspects of human behavior related to time-inconsistent planning, including procrastination, and abandonment, as well as the elegant graph-theoretic model encapsulating this framework recently proposed by Kleinberg and Oren (2014). The benefit of this extension is twofold. First, it enables to perform fine grained analysis of the behavior of present-biased agents depending on the optimisation task they have to perform. In particular, we study covering tasks vs. hitting tasks, and show that the ratio between the cost of the solutions computed by present-biased agents and the cost of the optimal solutions may differ significantly depending on the problem constraints. Second, our extension enables to study not only underestimation of future costs, coupled with minimization problems, but also all combinations of minimization/maximization, and underestimation/overestimation. We study the four scenarios, and we establish upper bounds on the cost ratio for three of them (the cost ratio for the original scenario was known to be unbounded), providing a complete global picture of the behavior of present-biased agents, as far as optimisation tasks are concerned.
Conditional Stochastic Optimization (CSO) covers a variety of applications ranging from meta-learning and causal inference to invariant learning. However, constructing unbiased gradient estimates in CSO is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives, under smooth and non-smooth conditions. We also provide matching lower bounds of BSGD for convex CSO objectives. Extensive numerical experiments are conducted to illustrate the performance of BSGD on robust logistic regression, model-agnostic meta-learning (MAML), and instrumental variable regression (IV).
This paper addresses consensus optimization problems in a multi-agent network, where all agents collaboratively find a minimizer for the sum of their private functions. We develop a new decentralized algorithm in which each agent communicates only with its neighbors. State-of-the-art decentralized algorithms use communications between either all pairs of adjacent agents or a random subset of them at each iteration. Another class of algorithms uses a random walk incremental strategy, which sequentially activates a succession of nodes; these incremental algorithms require diminishing step sizes to converge to the solution, so their convergence is relatively slow. In this work, we propose a random walk algorithm that uses a fixed step size and converges faster than the existing random walk incremental algorithms. Our algorithm is also communication efficient. Each iteration uses only one link to communicate the latest information for an agent to another. Since this communication rule mimics a man walking around the network, we call our new algorithm Walkman. We establish convergence for convex and nonconvex objectives. For decentralized least squares, we derive a linear rate of convergence and obtain a better communication complexity than those of other decentralized algorithms. Numerical experiments verify our analysis results.
Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network using only local computation and peer-to-peer communication. In this paper, we first propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. In the second part of this paper, we propose an error feedback based compressed gradient tracking algorithm (EF-C-GT) to further improve the algorithm efficiency for biased compression operators. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithms.
This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $epsilon$-accurate first-order stationary point with $Obig(maxbig{N^{frac{1}{2}},n(1-lambda)^{-2},n^{frac{2}{3}}m^{frac{1}{3}}(1-lambda)^{-1}big}Lepsilon^{-2}big)$ gradient complexity, where ${(1-lambda)in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N^{frac{1}{2}}(1-lambda)^{3})}$, this gradient complexity furthers reduces to ${O(N^{frac{1}{2}}Lepsilon^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.
In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD. In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD. When the global non-convex function additionally satisfies the Polyak-Lojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.